A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)

. The CF (Climate and Forecast) metadata conventions are designed to promote the creation, processing, and sharing of climate and forecasting data using Network Common Data Form (netCDF) ﬁles and libraries. The CF conventions provide a description of the physical meaning of data and of their spatial and temporal properties, but they depend on the netCDF ﬁle encoding which can currently only be fully understood and interpreted by someone familiar with the rules and relationships speciﬁed in the conventions documentation. To aid in development of CF-compliant software and to capture with a minimal set of elements all of the information contained in the CF conventions, we propose a formal data model for CF which is independent of netCDF and describes all possible CF-compliant data. Because such data will often be analysed and visualised using software based on other data models, we compare our CF data model with the ISO 19123 coverage model, the Open Geospatial Consortium CF netCDF standard, and the Unidata Common Data Model. To demonstrate that this CF data model can in fact be


Introduction
Network Common Data Form (netCDF) supports a view of data as a collection of self-describing, portable objects that can be accessed through standardised software libraries.For climate scientists, as well as others, it has become a popu-lar way to create, access, and share array-orientated scientific data (Rew and Davis, 1990;Rew et al., 2006).In this context, "self-describing" means that a file contains, for each data array, an associated description of what it represents scientifically, i.e. metadata.NetCDF was developed and is maintained at Unidata, part of the US University Corporation for Atmospheric Research (UCAR).
The CF (Climate and Forecast) metadata conventions (Eaton et al., 2011, http://cfconventions.org) are a set of rules for storing geoscientific data in netCDF files, with the aims of describing the data, enabling users to identify comparable data held in different files, and facilitating the development of software to extract, process, analyse, and display the data.Initially CF was developed for gridded data from climate and forecast models of the atmosphere and ocean, but its use has subsequently extended to other geosciences, and to observations as well as numerical models.The use of CF is recommended where applicable by Unidata.
CF metadata are designed to be interpretable without reference to external tables, readable by humans, easily parsable by programmes, and minimally redundant, which reduces the potential for inconsistencies.The development of CF began in 1999 and has proceeded incrementally, with new features added only when called for by common use cases, and with consideration of how the change might impact data producers.Archival of data is a major purpose of netCDF, for which reason backwards compatibility is an important consideration.So far, no backwards-incompatible change has been made to the CF conventions, meaning that a file written with a previous release of CF would still be compliant with the most recent version.
In general, CF tells data producers how they can provide information they think is important for understanding their data, but it mandates very little metadata.Projects which recommend or require the use of the CF conventions may of course impose additional requirements on data producers, as is done, for instance, by the Coupled Model Intercomparison Project (https://pcmdi.llnl.gov/mips/cmip5/CMIP5_output_metadata_requirements.pdf), which in its fifth phase (CMIP5) serves more than 4.2 petabytes of CF-compliant netCDF (CF-netCDF) data.
In this paper, we present a data model which is based on the version of the CF conventions (1.6) which was the latest release at the time of writing.(Since then, 1.7 has been released; see Sect.7.) By a "data model", we mean an abstract interpretation of the data, that identifies the elements of the dataset and their scientific intent, and describes how they are related to one another and to the real or model world from which the data were derived.A data model is necessary because it imposes the rules, constraints, and relationships connecting metadata to the data that are needed to imagine how the quantities included in the dataset should be combined and processed scientifically.
The netCDF interface that underlies CF has an explicit data model (the yellow layer in Fig. 1).CF is defined by the CF-netCDF conventions (the blue layer).The conventions have been widely adopted and there are many software applications that work with CF datasets, but up to now a comprehensive CF data model has not been explicitly proposed (the Open Geospatial Consortium is discussed in Sect.5.2).Those writing software to process CF-compliant files have implicitly or explicitly adopted data models to serve their own needs, not necessarily considering the whole of the CF convention.Possible CF data models may differ regarding concepts which are more abstract than the storage syntax of netCDF files, and which are therefore not spelled out by the CF convention but become relevant when the data are manipulated or visualised.Divergent interpretations of the data can lead to misunderstandings, inconsistencies, and inefficiencies, and impair the linking of independently developed software tools which might be needed together for the analysis of CF data.
Our aim is to create an explicit data model for CF (the green layer in Fig. 1) to provide an interpretation of the conceptual structure of CF which is consistent, comprehensive, and as far as possible independent of the netCDF data model (the yellow layer).We believe that an explicit comprehensive data model will lead to the CF conventions being better understood, will provide guidance during the development of future extensions to the CF conventions, and will help software developers to design CF-compliant data-processing applications and to build interfaces to other explicit data models.

Without a data model:
With a data model: The benefits of having a CF data model.The CF-netCDF conventions (CN) rely on the netCDF data model (NC) and, at present, a software application is forced to make its own interpretation of the CF-netCDF conventions -an interpretation that is likely to be different from that of other applications.The aim of this paper is to propose a comprehensive CF data model that provides a consistent interpretation of the conventions, thereby facilitating compatibility across applications that adopt it.This by no means precludes a variety of software implementations, as there is still considerable flexibility in mapping data model elements onto the data objects needed for a particular application.

Design criteria for a CF data model
The primary requirement of a data model is that it should be able to describe all existing and conceivable CF-compliant datasets.If we have been successful, then software libraries that adopt our CF data model in constructing their internal data structures will be able to represent and manipulate any CF-compliant dataset.
For our data model, we define a minimal set of elements that are sufficient for accommodating all aspects of the CF conventions.We restrict the elements of a data model to those that are explicitly mentioned in CF, but our data model elements do not have to be irreducible in that a data model element could describe more than one CF entity.For example, in CF, coordinates and coordinate bounds are distinct entities, but coordinate bounds cannot exist without coordinates.Therefore, it makes sense in our data model to group them into a single element.
Similarly, while it is possible to introduce additional elements not presently needed or used by CF, we believe this would not be desirable because it would increase the likelihood of a data model becoming outdated or inconsistent with future versions of CF.
The CF data model should also be independent of the encoding, meaning that it should not be constrained by the parts of the CF conventions which describe explicitly how to store (i.e.encode) metadata in a netCDF file.The virtue of this is that should netCDF ever fail to meet the community needs, we shall already have set the groundwork for applying CF to other file formats.

Layout of the paper
In Sect.3, we introduce the key elements of the CF conventions and describe how they are encoded in netCDF files.The relationships between the elements of the CF conventions and our proposed CF data model are described in Sect. 4. This data model is compared with other data models in Sect.5, and a software implementation is presented in Sect.6.How our CF data model and its software implementation may evolve is discussed in Sect.7, and a summary and conclusions are given in Sect.8.

The netCDF data model
The existing CF conventions are for use with netCDF files following the netCDF "classic" data model (the yellow layer in Fig. 1).A brief summary of this explicit data model is useful since the CF conventions cannot be described without reference to elements of netCDF.
The netCDF classic data model is described using Unified Modeling Language (UML) in Fig. 2. UML provides a standard way to visualise the components of a system and how they relate to each other.In UML, different styles of arrows denote different types of relationship (Table 1).Appendix A provides a primer on the subset of UML used in this paper, and is recommended for readers new to this style of diagram.
NetCDF classic files contain data in named variables, which can be single numbers (with no dimensions), onedimensional arrays (vectors), or multi-dimensional arrays, and the dimensions are declared by name in the file.Variables can be of integer, floating point, or character data types.Variables may have attributes, of any data type, attached.Attributes can have a single value or consist of a one-dimensional array.NetCDF files also have "global" file attributes which provide information about the dataset as a whole.NetCDF library software has functions to define dimensions, variables, and attributes, and write and read data.
It is important to appreciate that netCDF itself has no other semantics; for example, while coordinates can be stored in variables and described by attributes, the meanings of these variables and attributes and relationships between them and the variables containing data are not defined by netCDF.NetCDF makes no prescriptions or restrictions regarding the type of metadata which may be stored in the simple data structures that it offers.This flexibility is intended to provide a scope for users and scientific disciplines to develop their own conventions for encoding semantics so that datasets are sufficiently described by those who create them and that they remain valid for those who store and use them.CF is an example of this. .Key components of the netCDF classic data model (corresponding to the yellow "NC" layer in Fig. 1) described using UML (Appendix A).Files consist of global attributes, dimensions, and variables.Variables contain attributes and data, and attributes also contain data.Variables, attributes, and dimensions all contain properties, such as a "name" which identifies them in the file.A data array has a data type for all of its elements (e.g."double" for 64-bit floating point numbers).
The original classic netCDF data model has been "enhanced" with the addition of several new features, including the ability to organise variables in hierarchical groups.Here, we adopt only one of the new features: we regard the character string as a data type, whereas the classic model treats strings as arrays of individual characters.Logically, these treatments are equivalent, but because strings are easier to manipulate in software codes, it is very likely that they will become a part of CF in the future.

The CF conventions
In this section, we briefly describe how the most important of the CF conventions are encoded in netCDF files.We do not consider any conventions accepted after version 1.6 (see Sect. 7 for a discussion on the inclusion of newer conventions).The comprehensive definition of CF, which includes many extra details, can be found on the CF website (http://cfconventions.org).A netCDF file that encodes an example of each aspect of the conventions that we describe is shown in Fig. 3.This example file is presented in Common Data Language (CDL; Rew et al., 1997) -a human-readable notation for netCDF data that is easily produced by netCDF library software.In order to reduce storage occupied by netCDF files, the CF conventions provide for lossy packing of data values and non-lossy compression eliminating missing data values.Although practically valuable, these mechanisms do not affect our conceptual data model and so we have chosen to describe them in Appendix B rather than in this section.

Conventions from the netCDF user guide and COARDS
Unidata provides a netCDF user guide (NUG) (Rew et al., 1997), in which they propose some netCDF conventions.CF makes use of these conventions, so we regard them as part of the CF conventions as well (the blue layer in Fig. 1), and they are illustrated in Fig. 3.A one-dimensional variable that has the same name as its dimension (z, x, and y in Fig. 3) is regarded as a coordinate variable.Since CF introduces other types of variables for coordinate data, we sometimes refer to the kind defined in the netCDF user guide as a coordinate variable "in the NUG sense".(By the phrase "coordinate variable", the CF standard document consistently means "coordinate variable in the NUG sense".)We describe the various kinds of coordinate variables in more detail later (Sect.3.3).The netCDF user guide proposes a number of conventional attributes; some of these are explicitly included in CF.The guide also contains a statement that it is always allowable to make use of attributes that are not standardised by CF.Some of the Unidata attributes recognised by CF contain scientific metadata, e.g.source, for the provenance of the data, and the units of data values (further discussed in Sect.3.8).Others concern the encoding in netCDF files, e.g.Conventions, stating the netCDF conventions to which the file adheres (line 73 of Fig. 3) and the specification of a missing data value with missing_value (line 53 of Fig. 3, indicating that values of −10 30 correspond to cells for which no data are available, such as ocean points for a quantity measured only over land).When originally conceived, CF was an extension of the pre-existing COARDS (Cooperative Ocean/Atmosphere Research Data Service) netCDF conventions (http://www.ferret.noaa.gov/noaa_coop/coop_cdf_profile.html).For the sake of backward compatibility of datasets, although CF is now much more comprehensive and flexible than COARDS, CF explicitly upholds some COARDS conventions, which we therefore regard also as part of CF.

The data and the domain
The overarching purpose of the conventions is to provide conforming datasets with sufficient metadata that they are self-describing, in the sense that each variable in the file has an associated description of what it represents, and that each value can be located (usually in space and time).To meet this objective, we define a data variable V (which might, for example, represent air temperature), over a domain d, where d represents a set of discrete "locations" in what generally would be a multi-dimensional space, either in the real world or in a model's simulated world.Thus, V is a function of all its independent dimensions.For example, a variable that is a function of physical location alone would have a three-dimensional discretised domain, comprising discretised axes of height (z), latitude (y), and longitude (x) (Fig. 4).In CF, the domain may have fewer than three spatial axes, and it may also have any number of non-spatial axes, as in the common case of a variable that is a function time (t).A CF-netCDF file may contain N data variables and M domains, where M ≤ N. Conversely, this means that a given domain may have one or more data variables defined at each of its locations.For instance, there could be values for both air temperature and relative humidity at each location in the domain d(z, y, x).
In CF-netCDF, the values and the description of V are stored in a netCDF variable, called a "data variable".The concept of a domain is not mentioned in the CF conventions, because it does not correspond to any single entity in the netCDF file.The domain is stored in a number of other // PTOP:units = "Pa" ; // double temp(z, y, x) ; // Field construct temp.missing_value = 1.0e30 ; // temp:standard_name = "air_temperature" ; // temp:units = "K" ; // temp:cell_methods = "t: mean (interval: 1 day)" ; // Cell method construct temp:coordinates = "t lat lon" ; // temp:cell_measures = "area: cell_area" ; // temp:grid_mapping = "lambert_conformal" ; // temp:ancillary_variables = "temp_error_limit" ; // double total_wv(y, x) ; // Field construct total_wv:standard_name = "atmosphere_mass_content_of_water_vapor" ; // total_wv:units = "kg m2" ; // total_wv:cell_methods = "t: maximum" ; // Cell method construct total_wv:coordinates = "t lat lon" ; // total_wv:cell_measures = "area: cell_area" ; // total_wv:grid_mapping = "lambert_conformal" ; // double temp_error_limit(z, y, x) ; // Field ancillary construct temp_error_limit:standard_name = "air_temperature standard_error" ; // temp_error_limit:units = "K" ; /  variables and attributes that are linked to the data variable in various ways defined by the conventions.For instance, temp is a data variable (line 52 in Fig. 3) with a four-dimensional domain.Its z, y, x, and t dimensions each have an associated coordinate variable specifying the location at each point along the dimension (the t dimension being implied by the t scalar coordinate variable; see Sect.3.3 for details).Note that it is not possible to store a domain in the absence of any data variables (i.e.N ≥ 1), because in the absence of a data variable, CF-netCDF lacks mechanisms to associate the variables from which a domain could be defined.This does not mean that it is disallowed to create a dataset that contains only elements of a domain, but rather that the CF conventions only allow for them to be interpreted collectively as a domain when they are associated with at least one data variable (see Sect. "Interpreting CF-netCDF files" for an example and further discussion).
Within a CF-netCDF file, dimensions and coordinate variables may be used in the definition of multiple domains, thus reducing redundancy.In our example file, total_wv is a data variable containing the vertical integral of atmospheric water vapour (line 61 in Fig. 3) that has a different domain than the temp data variable.Its t, y, and x dimensions and their coordinates (see Sect. 3.3) are identical to those of temp, so they need not be replicated, but it does not require the z dimension.

Dimensions and coordinates
NetCDF dimensions establish the size of the index space of data variables, e.g.lines 3-5 in Fig. 3, which specify sizes of 106, 110, and 20 for the x, y, and z dimensions, respectively.Each point of the domain of temp is thus defined by a unique set of three indices i = 0, . . ., 105, j = 0, . . ., 109, and k = 0, . . ., 19.NetCDF coordinate variables (in the NUG sense) supply the independent variables on which the data depend.Coordinate variables must be numeric and strictly monotonic, so that each element has a unique value.In our example, we have three coordinate variables, with values z(k), y(j ), and x(i).Each dimension, with its coordinate variable if it has one, constitutes an axis of the multidimensional space of the domain.The CF conventions quite often use the word "axis" to refer to the physical interpretation of the dimensions of the data.
In many cases, each dimension of a domain can be fully described by a single, strictly monotonic coordinate variable (e.g.time, height, latitude, longitude).For more complicated cases, however, such as parametric vertical coordinates (e.g.dimensionless atmosphere sigma coordinates), CF provides a way to record how to compute, from the original dimensional coordinates, dimensional coordinates identifying the location of the data in physical space (in the case of sigma, the air pressure).This information is encoded with the standard_name and formula_terms attributes of a parametric coordinate variable, e.g.lines 14 and 17 in Fig. 3.The standard_name attribute defines the formula for calculating the dimensional coordinates, which needs to be looked up in the CF conventions document, and the for-mula_terms attribute specifies the values of the formula's terms.The atmosphere_sigma_coordinate formula specified in Fig. 3 calculates air pressure from ptop+sigma * (ps − ptop), where the values of "ptop" (pressure at the top of the model), "sigma" (the dimensionless coordinates), and "ps" (surface pressure) are taken from the netCDF variables referenced by the formula_terms attribute.
CF also defines "auxiliary coordinate variables" to provide mandatory or optional coordinate information which is additional or alternative to that contained in the coordinate variables in the NUG sense.Auxiliary coordinate variables can be string valued, may contain missing values, and are not necessarily monotonic.For example, we might like to associate the coordinates of a vertical axis with model level number as well as sigma coordinate or to provide location information and station names for the points in a time series (as in Fig. 5).An auxiliary coordinate variable is encoded as a netCDF variable that is referenced by the coordinates attribute of a data variable and spans at least one of that data variable's dimensions, e.g.lines 27, 30, and 57 in Fig. 3. Coordinate variables (in the NUG sense) and auxiliary coordinate variables (defined by CF) rely on different semantics, and the latter is not a special type of the former, even though they share many characteristics.
An important and mandatory use of auxiliary coordinates is to supply latitude and longitude locations of each point when the horizontal axes of a grid are themselves not latitude and longitude (e.g. if they refer to a rotated North Pole or are based on a map projection, as is the case for x and y in the example of Fig. 3, and sketched in Fig. 6).For this case, the latitude and longitude auxiliary coordinate variables are two-dimensional and can be used to indirectly locate a point horizontally, so that V = V (y j , x i ) with longitude = longitude(y j , x i ), and latitude = latitude(y j , x i ), as in lines 27 and 30 of Fig. 3. Given this information, the data can be located in space by generic applications even if they are ignorant of the rules used to construct the projection.However, CF also provides for information about the grid construction to be included in a netCDF file by defining a "grid mapping" variable, which is refer- enced by the grid_mapping attribute of a data variable, e.g.lines 41 and 59 in Fig. 3. Some axes have only a single coordinate value.Regrettably, single-valued coordinates are often omitted from metadata, although they are very useful; for example, the time information for a field sampled at a single time, for instance, 12:15 Z on 14 July 2015, or the level of a singlelevel field, e.g.air temperature at a height of 1.5 m (Fig. 4).For convenience in storing single-valued coordinates, CF defines a third type of variable containing coordinate data, namely a "scalar coordinate variable", which requires less netCDF machinery than a dimension of size unity.It is a zero-dimensional netCDF variable that is referenced by the coordinates attribute of a data variable, e.g.lines 8 and 57 in Fig. 3.
Calendar time in CF (year, month, day, hour, minute, second) is encoded with units "time unit since reference datetime" (e.g.line 9 in Fig. 3).The encoded coordinates are the elapsed times since the reference and as such are useful for computing time differences.CF does not use strings for time because they cannot be used for such computations, are inconvenient to standardise, take more storage space, and cannot always be ordered monotonically.The encoding depends on the calendar (e.g.line 10 in Fig. 3), which defines the permitted values of the reference date (year, month, and day).The format of the units string conforms to udunits syntax (Sect.3.8), but the udunits software supports only the realworld Julian/Gregorian calendar and hence is not sufficient for use with CF, which recognises a wide selection of calendars, including those for climate models and palaeoclimate.For instance, 31 August 2003 is a valid date in the real-world Gregorian calendar but not in the "360-day" calendar, which has 12 30-day months.In the Gregorian calendar, 12:00 Z on 29 February 2000 is 36 583.5 days since 00:00 Z on 1 January 1900, but it is 36 058.5 days in the 360-day calendar.

y(y)
x (x) Figure 6.For grid axes based on a map projection, two-dimensional auxiliary coordinate variables must be used to store longitude and latitude values for each location (latitude-longitude lines dashed; grid lines solid).

Discrete axes and sampling geometries
A "discrete axis" is one which is not associated with any "continuous" coordinate or auxiliary coordinate variables.A variable is continuous along an axis if it makes physical sense to interpolate along that axis between its values.If that is not the case, then either there are no coordinate values or the coordinate values are discrete indices, whose order may or may not be meaningful.Consider, for example, an ensemble of model experiments, each of which produces a data variable V (t, z, y, x) containing air pressure as a function of time and spatial location.It may be convenient to combine the data variables into a single variable V (e, t, z, y, x), where e is the ensemble dimension, defining a discrete axis of ensemble members.The members could be identified by a numeric monotonic coordinate variable with the same name as the dimension containing a member number.Alternatively, it is common for them to be identified by one or more strings, which we could store in auxiliary coordinate variables with the ensemble dimension containing, for instance, model names or experiment names.It usually would not make sense to interpolate between ensemble members, and their order may be immaterial.
An important use of discrete axes in CF is to store data from a collection of "discrete sampling geometries" (DSGs) in a single data variable.In a DSG, the data have a lower dimensionality than the space-time domain, because they apply to a point or path within the domain.For example, a collection of time series of surface air temperature at meteorological stations can be stored in a two-dimensional data variable (Fig. 5) with a dimension that is the discrete axis for the stations (the "point" axis) and a dimension for the time series values (the "time" axis).Auxiliary coordinate variables for the discrete axis can be used to provide location information and station names (Fig. 5).Although the stations are physically located in two or three spatial dimensions, these have been combined into the one discrete axis.Other examples of DSGs are a vertical profile (variation along a vertical axis at a fixed time and spatial location) and a trajectory (variation along a path through space as a function of time).A DSG may also be called a "feature" and the type of DSG is called its "feature type".The feature type (point, time series, trajectory, profile, etc.) describes the DSG and specifies the dimensionality of the data within the space-time domain.Prior to the introduction of DSGs in version 1.6 of CF, the concept of DSG features was implicit in the sense that they could be inferred from the existence of a discrete axis and well-defined coordinate and auxiliary coordinate variables, but they were not formally described.In general, the netCDF file attribute featureType specifies the DSG feature type for every data variable in the file; but in the cases where the featureType attribute is permitted to be missing, the feature type may be inferred from the dimensions and space-time coordinates alone.
Many DSGs many be stored in one file, but they might have different coordinates, e.g. each time series might have its own set of sampling times (different days or hours of observation), or each profile might have its own set of vertical levels (e.g.air pressure reported by radiosondes).If each feature in a large collection is stored as an individual data variable with its own dimensions and coordinate, the file will be cumbersome.If they are combined into a single data variable, a one-dimensional coordinate variable would need to contain the union of all the coordinates (times, levels, etc.) required, and the dimension of the combined data variable might be much larger than needed for the available data, containing a lot of missing data elements.
As an alternative, CF provides three other methods for storing collections of data on DSGs, all intended to allow data with different dimensionality to be stored in a single data variable without wasting so much space.In the "incomplete multi-dimensional array" representation, the dimension required for the longest feature is used for all features, so that the shorter features must be padded with missing values; this sacrifices storage space to achieve simplicity for reading and writing.The "contiguous ragged array" and "indexed ragged array" representations eliminate the need for padding and thus reduce further the storage required, but they are more complex to pack and unpack.In the former case, each feature in the collection occupies a contiguous block, requiring the size of each feature to be known at the time that it is created.In the latter case, the values of each feature in the collection are interleaved.This representation can therefore be used for real-time data streams that contain reports from many sources, with the data being written as they arrive.The ragged array representations are described in more detail in Appendix B.
Because these storage methods were introduced (in CF version 1.6) at the same time as the recognition and definition of feature types, the two are often thought of as belonging together, but this causes confusion.The featureType is meta-data, and it refers to the physical construction and interpretation of a DSG data variable.The three new storage mechanisms for DSGs do not involve any new or distinct physical concepts.

Bounds and cells
It is often necessary to know the extent of a cell as well as the grid point location, e.g. to calculate the area of a latitudelongitude box or the thickness of a vertical layer.If cell bounds are not provided, then there is no default assumption about cell sizes (an application might reasonably assume that grid points are at the centres of non-overlapping cells, but that is not required by CF).
CF provides a way to attach bounds variables to any variable containing coordinate data.A bounds variable has an extra dimension to index the vertices of the cells.The simplest case is shown for a one-dimensional coordinate variable in Fig. 7.In this case, the values p(i) may be a series of successive time instants, for example, midday on 6 and 7 November, bounded by b(i, q) at midnight on 6, 7, and 8 November, with q = 0, 1.While the bounds for a onedimensional coordinate variable of dimension (n) could often be stored in a vector of dimension (n + 1), CF uses (n, 2) instead as shown because it is convenient for use with the netCDF unlimited dimension, and because it allows cells to be non-contiguous or overlapping.The bounds can be used to test contiguity; in the figure, cell i and cell i + 1 are contiguous because b(i + 1, 0) = b(i, 1).For multi-dimensional auxiliary coordinate variables, such as the two-dimensional latitude and longitude variables illustrated above, we have to supply the coordinates of each vertex of the polygon and contiguity can similarly be tested by coincidence of vertices.A bounds variable is encoded as a netCDF variable that is referenced by the bounds attribute of a coordinate or auxiliary coordinate variable and spans the same dimensions (as well as the extra dimension defining the number of vertices), e.g.lines 6, 22, and 36 in Fig. 3. Some applications require information about the size, shape, or location of the cells that cannot be deduced without specialist knowledge which is not guaranteed to be available.For example, in computing the mean of several cell values, it is often appropriate to "weight" the values by area, but for some grids (such as some types of spherical geodesic grids) the cell perimeter is not uniquely defined by its vertices and so the area cannot be inferred from the available information.For this case, CF provides cell measures variables which contain such information and are encoded as netCDF variables which are referenced by the cell_measures attribute of a data variable and span a subset of the data variable's dimensions, e.g.lines 38 and 58 in Fig. 3.

Variation within cells
CF describes variation within cells by use of "cell methods".By default, it is assumed that intensive quantities apply at grid points, e.g.temperature values apply at the spatial points and instants of time specified by their coordinates, while extensive quantities apply to the entire grid cell, e.g. a precipitation amount (kg m −2 ) is an accumulation in time.
The method may be different for each axis, e.g.precipitation amount is intensive in space even though it is extensive in time.Because the default is not always obvious, it is recommended that the method be stated explicitly for every axis.Non-default methods include operations such as mean, maximum, minimum, and standard deviation.A zonal-mean variable, for instance, has a cell methods attribute that specifies it is a mean over longitude.A time series of daily maximum values has cell methods indicating that the values are maxima within their cells in time.The operations recorded by cell methods might affect more than one axis at once, e.g. for the cell methods necessary to describe maximum of the ocean meridional overturning stream function within a depth-latitude cell.By default, the method for horizontal cells is assumed to have been evaluated over the entire area of the cell.It is, however, possible to limit consideration to only a portion of a cell, e.g. to record that values apply only to the fractions of cells which are land (as opposed to sea).
A further use of cell methods is to characterise climatological statistics where a series of data points represent sets of subintervals which are not contiguous.There are three kinds to consider: 1. corresponding portions of the annual cycle in a set of years, e.g.decadal averages for January; 2. corresponding portions of a range of days, e.g. the average diurnal cycle in April 1997; and 3. both at once, e.g. the average winter daily minimum temperature from the years 1961 to 1990.
In the latter example, the bounds are 00:00 Z 1 December 1961 (beginning of the first day of the first winter) and 00:00 Z 1 March 1991 (end of the last day of the last winter), and the cell methods indicate the values are a minimum within days, a mean over a season, and a mean over years.
Cell methods are encoded in the cell_methods attribute of a data variable, e.g.line 56 in Fig. 3.In this example, the cell method t: mean (interval: 1 day) specifies that data values are means over the time dimension (i.e.temporal averages) calculated from daily samples.Note that this netCDF file does not contain a dimension for time, but it is sufficient that one is implied by the time scalar coordinate variable on line 8.

Ancillary data
When metadata to describe the data depend on location within the domain, they are stored in independent variables called ancillary data variables.For example, each value of an array of instrument data may have associated measures of uncertainty or of the status of the recording instrument.An ancillary data variable is encoded as a netCDF variable that is referenced by the ancillary_variables attribute of a data variable and spans a subset of the data variable's dimensions, e.g.lines 60 and 68 in Fig. 3.

Units and standard name
A range of attributes is available, introduced by the netCDF user guide or CF, providing metadata for interpreting the values of individual variables or about the dataset as a whole.In this section, we discuss the two most important of these.
CF requires all variables with values (data variables, coordinate variables, etc.) to have units unless they contain dimensionless numbers or cell boundary values.The units are specified by a string attribute (e.g.lines 9 and 55 of Fig. 3), formatted according to the Unidata udunits conventions (Emmerson, 2007), which support many possible units and varieties of syntax, e.g.metre, meter, meters, m, km, cm, second, s, kelvin, K, Pa, W m-2, W/m ∧ 2, kg/m2/s, 1 (or any number).Many non-SI units are also supported by udunits, e.g.degree, degree_north, degree_N, percent (equivalent to 0.01), ppm (equivalent to 1e-6), mbar, mile, degC, degF, hours, and days.
For systematic identification of the physical quantity contained in variables, CF defines a "standard name" string attribute (e.g.lines 28, 54 and 62 of Fig. 3), with permissible values listed in the standard name table (http://cfconventions.org/standard-names.html),which includes precise definitions.The standard name table is managed by a community process and is continually expanding -version 44 of the table, released in May 2017, contains 2847 standard names.
CF also upholds the use of the "long name" defined by the netCDF user guide, but this is ad hoc.In contrast, the CF standard names are consistently constructed and documented.As CF is applicable to many areas of geoscience, the standard names have to be more self-explanatory and informative than would suffice for any one area.For instance, there is no name for plain "potential temperature", since we have to distinguish air potential temperature and www.geosci-model-dev.net/10/4619/2017/Geosci.Model Dev., 10, 4619-4646, 2017 Figure 8.The relationships between CF-netCDF elements (corresponding to the blue CN layer in Fig. 1) and their corresponding netCDF variables, dimensions, and attributes (the yellow NC layer in Fig. 1) described using UML (Appendix A).It is useful to define an abstract generic coordinate variable that can be used to refer to coordinates when the their type (coordinate, auxiliary, or scalar coordinate variable) is not an issue.The CF convention details the mechanisms which are used in the netCDF file to express the relationships among the CF-netCDF elements, but these are not shown in the UML.
sea water potential temperature.Standard names are often longer than the terms familiarly used by the experts in particular discipline, because they answer the question, "What does this mean?",rather than the question, "What do you call this?".For example, the quantity often called "precipitable water" by meteorologists has the standard name of atmosphere_mass_content_of_water_vapor.Standard names have a detailed description which further defines parts of the name; for example, the description of the standard name land_ice_calving_rate notes that "land ice" means glaciers, ice caps, and ice sheets resting on bedrock, and the land ice calving rate is the rate at which ice is lost per unit area through calving into the ocean.Each standard name also implies particular physical dimensions (mass, length, time, and other dimensions corresponding to SI base units, expressed as a "canonical unit"); for example, large-scale rainfall amount (canonical unit kg m −2 ), large-scale rainfall flux (kg m −2 s −1 ), and large-scale rainfall rate (m s −1 ) are all different in CF, although they might all be vaguely referred to as "large-scale rain".Standard names have been defined for both more general and more specific quantities, for different applications, e.g.ocean_mixed_layer_thickness and ocean_mixed_layer_thickness_defined_by_temperature.Some standard names require the existence of additional metadata and/or constraints on the values of the variables with which they are associated.For example, the standard name of down-welling_radiance_per_unit_wavelength_in_air requires there to be a coordinate variable storing the radiation wavelength.
The CF conventions use size one or scalar coordinate variables (Sect.3.3) and the cell_methods attribute (Sect.3.6) to describe some aspects of a variable, and this means standard names do not always correspond to identities of variables in other file formats.For instance, to describe the time-mean air temperature at 1.5 m above the ground, air_temperature alone is the standard name; "time-mean" is described by cell_methods and the height as a coordinate.

A CF data model
The aspects of the CF conventions discussed in Sect. 3 (i.e.CF-netCDF elements) are listed in Table 2 and shown (in blue) with their interrelationships in UML (Appendix A) in Fig. 8.The CF-netCDF elements and the relationships among them could be regarded as defining a CF data model (rather like that described in Sect.5.2) to which any CFcompliant dataset may be mapped.It is not the data model that we propose, however, because it does not meet all of the design criteria described in Sect.1.1.Our CF data model (the green layer in Fig. 1) has been derived from these CF-netCDF elements and relationships with the aims of removing aspects specific to the netCDF encoding, and reducing the number of elements, whilst retaining the ability to describe the CF conventions fully.The elements of our CF data model are called "constructs", a term chosen to differentiate from the CF-netCDF elements previously defined and to be programming language-neutral (i.e. as opposed to "object" or "structure").In this section, we relate the constructs to the CF-netCDF elements of Sect.3, which in turn relates the CF-netCDF elements to the components of netCDF files.To clarify these connections, the example netCDF file shown in Fig. 3 indicates how its elements relate to the constructs of our CF data model.

The field construct
The field construct is central to our CF data model and includes all the other constructs (Fig. 9).A field corresponds to a CF-netCDF data variable with all of its metadata.All CF-netCDF elements are mapped to some element of the CF field construct, as we describe in following subsections, and the field constructs completely contain all the data and metadata which can be extracted from the file using the CF conventions.Note that the constructs contained by the field construct cannot exist independently, as is indicated by the nature of the class associations shown in Fig. 9 (see Table 1 for details).
The field construct consists of a data array and the definition of its domain (i.e.V (d) in Eq. 1), ancillary metadata fields defined over the same domain, and cell method constructs to describe how the cell values represent the variation of the physical quantity within the cells of the domain (Fig. 9).Because the CF conventions do not mention the concept of the domain, we do not regard it as a construct of the data model.Instead, the domain is defined collectively by various other constructs included in the field.All of the data model constructs are listed in Table 3, which refers to the sections and figures in which they are fully described.All of the constructs contained by the field construct are optional (as indicated by "0..*" in Fig. 9).The only component of the field which is mandatory is the data array.
The field construct also has optional properties to describe aspects of the data that are independent of the domain.These correspond to some netCDF attributes of variables (e.g. the units, long_name, and standard_name; Sect.3.8), and some netCDF global file attributes (e.g."history" and "institution").We use the term "property", rather than "attribute", because not all CF-netCDF attributes are properties in this sense -some CF-netCDF attributes are used to point to (i.e.reference) other netCDF variables and so only describe the data indirectly (e.g. the "coordinates" attribute; Sect.3.3), and others have structural functions in the CF-netCDF file (e.g. the "Conventions" attribute).In the data model, we consider that netCDF global file attributes apply to every data variable in the file, except where they are superseded by netCDF data variable attributes with the same name.This interpretation of global file attributes is not stated in the CF conventions, but for our data model it is necessary because there is no notion of a file.Hence, metadata stored in attributes of the file as a whole have to be transferred to the field construct.If present, the global file attribute featureType applies to every data variable in the file with a discrete sampling geometry (Sect.3.4).Hence, we regard feature type as a property of the field construct.
The standard_name property (Sect.3.8) constrains the units property (i.e.only certain units are consistent with each standard name) and in some cases also the dimensions that a data variable must have.These constraints, however, do not supply any further information -they are just for selfconsistency.This is also the case for the feature type property, which imposes some requirements on the axes the domain must have.Following our aim of constructing a minimal data model, we do not regard the standard name nor feature type as separate constructs within the field, because they do not depend on any other construct for their interpretation.This is unlike a cell method, for instance, which depends on the data variable's dimensions for its interpretation.

Domain axis construct and the data array
A domain axis construct (Fig. 10) specifies the number of points along an independent axis of the domain.It comprises a positive integer representing the size of the axis.In CF-  When a collection of DSG features has been combined in a data variable using the incomplete orthogonal or ragged representations to save space, the axis size has to be inferred, but we regard this as an aspect of unpacking the data, rather than its conceptual description.In practice, the unpacked data array may be dominated by missing values (as could occur, for example, if all features in a collection of time series had no common time coordinates), in which case it may be preferable to view the collection as if each DSG feature were a separate variable (Sect.3.4), each one corresponding to a different field construct.

Coordinates: dimension coordinate and auxiliary constructs
Coordinate constructs (Fig. 10) provide information which locate the cells of the domain and which depend on a subset of the domain axis constructs.As previously discussed, there are two distinct types of coordinate construct: a dimension coordinate construct provides monotonic numeric coordinates for a single domain axis, and an auxiliary coordinate construct provides any type of coordinate information for one or more of the domain axes.
In both cases, the coordinate construct consists of a data array of the coordinate values which spans a subset of the domain axis constructs, an optional array of cell bounds recording the extents of each cell, and properties to describe the coordinates (in the same sense as for the field construct).An array of cell bounds spans the same domain axes as its coordinate array, with the addition of an extra dimension whose size is that of the number of vertices of each cell.This extra dimension does not correspond to a domain axis construct since it does not relate to an independent axis of the domain (for example, the bounds dimension defined at line 6 in Fig. 3 does not correspond to a domain axis construct).Note that, for climatological time axes, the bounds are interpreted in a special way indicated by the cell method constructs.
The dimension coordinate construct is able to unambiguously describe cell locations because a domain axis can be associated with at most one dimension coordinate construct, whose data array values must all be non-missing and strictly monotonically increasing or decreasing.They must also all be of the same numeric data type.If cell bounds are provided, then each cell must have exactly two vertices.CF-netCDF coordinate variables and numeric scalar coordinate variables correspond to dimension coordinate constructs.
Auxiliary coordinate constructs have to be used, instead of dimension coordinate constructs, when a single domain axis requires more then one set of coordinate values, when coordinate values are not numeric, strictly monotonic, or contain missing values, or when they vary along more than one domain axis construct simultaneously.CF-netCDF auxiliary coordinate variables and non-numeric scalar coordinate variables correspond to auxiliary coordinate constructs.
If a domain axis construct does not correspond to a continuous physical quantity, then it is not necessary for it to be associated with a dimension coordinate construct.For example, this is the case for an axis that runs over ocean basins or area types, or for a domain axis that indexes a time series at scattered points.In such cases, one-dimensional auxiliary coordinate constructs could be used to store coordinate values.These axes are discrete axes in CF-netCDF.

Coordinate reference construct
The domain may contain various coordinate systems, each of which is constructed from a subset of the dimension and auxiliary coordinate constructs.For example, the domain of a four-dimensional field construct may contain horizontal (yx), vertical (z), and temporal (t) coordinate systems.There may be more than one of each of these, if there is more than one coordinate construct applying to a particular spatiotem-Geosci.Model Dev., 10, 2017 www.geosci-model-dev.net/10/4619/2017/ .The nine constructs of our CF data model (corresponding to the green layer in Fig. 1) described using UML (Appendix A).In this and all other CF data model diagrams, the CF data model constructs are labelled with the "construct" stereotype to distinguish them from other data model elements which appear in other diagrams as green boxes with no label.The field construct corresponds to a CF-netCDF data variable.Relationships between other constructs and CF-netCDF are given in Figs. 10 and 11.The domain provides the linkage between the field construct and the constructs which describe measurement locations and cell properties.It is not a construct of the data model (see Sect. 4.1) but an abstract concept that is useful for understanding it.Similarly, it is useful to define an abstract generic coordinate construct that can be used to refer to coordinates when the their type (dimension or auxiliary coordinate construct) is not an issue.poral dimension (for example, there could be both latitudelongitude and y-x projection coordinate systems).In general, a coordinate system may be constructed implicitly from any subset of the coordinate constructs, yet a coordinate construct does not need to be explicitly or exclusively associated with any coordinate system.
A coordinate system of the field construct can be explicitly defined by a coordinate reference construct (Fig. 11) which relates the coordinate values of the coordinate system to locations in a planetary reference frame and consists of the following: -The dimension coordinate and auxiliary coordinate constructs that define the coordinate system to which the coordinate reference construct applies.Note that the coordinate values are not relevant to the coordinate reference construct, only their properties.
-A definition of a datum specifying the zeroes of the dimension and auxiliary coordinate constructs which define the coordinate system.The datum may be explicitly indicated via properties, or it may be implied by the metadata of the contained dimension and auxiliary coordinate constructs.Note that the datum may contain the definition of a geophysical surface which corresponds to the zero of a vertical coordinate construct, and this may be required for both horizontal and vertical coordinate systems.
-A coordinate conversion, which defines a formula for converting coordinate values taken from the dimension or auxiliary coordinate constructs to a different coordinate system.A term of the conversion formula can be a scalar or vector parameter which does not depend on any domain axis constructs, may have units (such as a reference pressure value), or may be a descriptive string (such as the projection name "mercator"), or it can be a domain ancillary construct (such as one containing spatially varying orography data).
For y-x coordinates, the coordinate conversion is either a map projection, which converts between Cartesian coordinates and spherical or ellipsoidal coordinates on the vertical datum, or a conversion between different spherical coordinate systems (as in the case of rotated pole coordinates).In the case of z coordinates, the conversion is between a coordinate construct with parameterised values (such as ocean sigma coordinates) and a coordinate construct with dimensional values (such as depths), again with respect to the vertical datum.
In some cases, the datum is not required as it is already described by the dimension and auxiliary coordinate constructs.This is the case in CF for the two-dimensional geographical latitude-longitude coordinate system based upon a spherical Earth, which is assumed to have a datum at 0 • N, 0 • E. Similarly, the coordinate conversion is not required if no other coordinate systems are described.Some parts of the coordinate reference construct may not be relevant to a given coordinate construct which it contains.The relevant parts are determined by an application using the coordinate reference construct.For example, for a coordinate reference construct which contained coordinate constructs for y-x projection and latitude and longitude coordinates, a datum comprising a reference ellipsoid would apply to all of them, but projection parameters would only apply to the projection coordinates.
In CF-netCDF, coordinate system information that is not found in coordinate or auxiliary coordinate variables is stored in a grid mapping variable or the formula_terms attribute of a coordinate variable, for horizontal or vertical coordinate variables, respectively.Although these two cases are arranged Figure 10.The relationship between domain axis, dimension coordinate, and auxiliary coordinate constructs (Sect.4.2 and 4.3) and CF-netCDF described using UML (Appendix A).A dimension or auxiliary coordinate construct is defined by a CF-netCDF coordinate, scalar coordinate, or auxiliary coordinate variable, and the associated CF-netCDF boundary variable if it exists.A generic coordinate construct spans one or more domain axis constructs, but the mapping of which ones is only held by the parent field construct.differently in CF-netCDF, each one contains, sometimes implicitly, a datum or a coordinate conversion formula (or both) and so may be mapped to a coordinate reference construct.A grid mapping name or the standard name of a parametric vertical coordinate corresponds to a string-valued scalar parameter of a coordinate conversion formula.A grid mapping parameter which has more than one value (as is possible with the "standard parallel" attribute) corresponds to a vector parameter of a coordinate conversion formula.A data variable referenced by a formula_terms attribute corresponds to the term of a coordinate conversion formula -either a domain ancillary construct or, if it is zero-dimensional, a scalar parameter.

Domain ancillary construct
A domain ancillary construct (Fig. 11) provides information which is needed for computing the location of cells in an alternative coordinate system.It is the value of a term of a coordinate conversion formula that contains a data array, which is zero-dimensional or which depends on one or more of the domain axes.
It also contains an optional array of cell bounds recording the extents of each cell (only applicable if the array contains coordinate data) and properties to describe the data (in the same sense as for the field construct).An array of cell bounds spans the same domain axes as the data array, with the addition of an extra dimension whose size is that of the number of vertices of each cell.
CF-netCDF variables named by the formula_terms attribute of a CF-netCDF coordinate variable correspond to domain ancillary constructs.These CF-netCDF variables may be coordinate, scalar coordinate, or auxiliary coordinate variables, or they may be data variables.For example, in a coordinate conversion for converting between ocean sigma and height coordinate systems, the value of the "depth" term for horizontally varying distance from ocean datum to sea floor would correspond to a domain ancillary construct.In the case of a named term being a type of coordinate variable, that variable will correspond to an independent domain ancillary construct in addition to the coordinate construct.

Cell measure construct
A cell measure (Fig. 9) construct provides information that is needed about the size or shape of the cells and that depends on a subset of the domain axis constructs.Cell measure constructs have to be used when the size or shape of the cells cannot be deduced from the dimension or auxiliary coordinate constructs without special knowledge that a generic application cannot be expected to have.(Sect. 4.4 and 4.5) and CF-netCDF described using UML (Appendix A).A coordinate reference construct is defined either by a grid mapping variable or a formula_terms attribute of a CF-netCDF coordinate variable.The coordinate reference construct is composed of generic coordinate constructs, a datum, and a coordinate conversion formula.The coordinate conversion formula is usually defined by a named formula in the CF conventions.A domain ancillary construct term of a coordinate conversion formula is defined by a CF-netCDF data variable or a CF-netCDF generic coordinate variable.
The cell measure construct consists of a numeric array of the metric data which span a subset of the domain axis constructs, and properties to describe the data (in the same sense as for the field construct).The properties must contain a "measure" property, which indicates which metric of the space it supplies, e.g.cell horizontal areas, and a units property consistent with the measure property, e.g.m 2 .It is assumed that the metric does not depend on axes of the domain which are not spanned by the array, along which the values are implicitly propagated.CF-netCDF cell measure variables correspond to cell measure constructs.

Field ancillary constructs
The field ancillary construct (Fig. 9) provides metadata which are distributed over the same sampling domain as the field itself.For example, if a data variable holds a variable retrieved from a satellite instrument, a related ancillary data variable might provide the uncertainty estimates for those retrievals (varying over the same spatiotemporal domain).
The field ancillary construct consists of an array of the ancillary data, which is zero-dimensional or which depends on one or more of the domain axes, and properties to describe the data (in the same sense as for the field construct).It is assumed that the data do not depend on axes of the domain which are not spanned by the array, along which the values are implicitly propagated.CF-netCDF ancillary data variables correspond to field ancillary constructs.Note that a field ancillary construct is constrained by the domain definition of the parent field construct but does not contribute to the domain's definition, unlike, for instance, an auxiliary coordinate construct or domain ancillary construct.

Cell method construct
The cell method constructs (Fig. 9) describe how the cell values represent the variation of the physical quantity within its cells -the structure of the data at a higher resolution.A single cell method construct consists of a set of axes (see below), a "method" property which describes how a value of the field construct's data array describes the variation of the quantity within a cell over those axes (e.g. a value might represent the cell area average), and properties serving to indicate more precisely how the method was applied (e.g.recording the spacing of the original data, or the fact the method was applied only over El Niño years).
The field construct may contain an ordered sequence of cell method constructs describing multiple processes which have been applied to the data, e.g. a temporal maximum of the areal mean has two components -a mean and a maximum -each acting over different sets of axes.It is an ordered sequence because the methods specified are not necessarily commutative.There are properties to indicate climatological time processing, e.g.multiannual means of monthly maxima, in which case multiple cell method constructs need to be considered together to define a special interpretation of boundary coordinate array values.The cell_methods attribute of a CF-netCDF data variable corresponds to one or more cell method constructs.
The axes over which a cell method applies are either a subset of the domain axis constructs or a collection of strings which identify axes that are not part of the domain.The latter case is particularly useful when the coordinate range for an axis cannot be precisely defined, making it impossible to define a domain axis construct.For example, a climatological time mean might be based on data which are not available over the same time periods at every horizontal location -useful information can still be conveyed by recording the fact the data have been temporally averaged without specifying the range of times.The strings which identify such axes are well defined in that they must be standard names (e.g.time, longitude) or the special string "area", indicating a combination of horizontal axes.

Relationship to other data models
A data model does not exist on its own, and those exploiting it will need to interpret it in the context of other data models with which they already work, whether they are implicit or explicit (Sect.1.1).Often a clear, unambiguous, and universally agreed (or even agreeable) mapping between data models is not possible.Nativi et al. (2008), who establish a mapping between the Unidata Common Data Model and relevant international standards, discuss many of the relevant issues.Here, we confine ourselves to drawing some parallels between our explicit CF data model and selected other data modelling activities.We begin with the ISO 19123 coverage model, which provides an abstract view of the problem arena.
Readers who are not familiar with other data models may wish to omit this section on a first reading, as it is not required to understand the CF conventions, the CF data model presented here, nor the software implementation of Sect.6.

5.1
The ISO 19123 coverage model ISO 19123 (International Standards Organisation, 2007) provides a language and conceptual schema for describing "coverages", that is, for datasets which assign data values to specified data locations (in space and/or time).ISO 19123 builds on a range of other ISO standards for geographical information, collectively known as the ISO 191xx series.
An ISO 191xx coverage may be viewed as a function whose inputs are spatiotemporal positions (the "domain") which are related to outputs comprising values of one or more geographical features (the "range" of the coverage).Thus, a coverage is notionally a function over a domain which has a range of values.Within ISO 19123, two types of coverage are defined: discrete and continuous.The former is the most relevant here (see Fig. 12a), having a domain which consists of a set of "domain objects", themselves described by spatial objects and temporal primitives.In such a coverage, all the objects must share the same coordinate reference system.Coordinate reference systems themselves can be compound, but the simplest single coordinate reference system effectively consists of a datum and a set of coordinates (which might include one or more parametric coordinates) defining a coordinate system.Although the language of much of the ISO 19123 specification is cast in terms of simple geospatial coordinates, these abstract coordinate systems are in fact fully general -although many ISO-compliant implementations restrict them to geospatial coordinates -and cannot reflect the full generality of CF coordinates such as wavelengths, ensembles, etc. UML diagrams (Appendix A) for the full ISO 19123 model are given in Fig. 12.
For point data, a discrete coverage is nearly identical to a CF field construct (V (d), as described in Sect.3).However, a CF field construct explicitly describes a single variable on a domain, whilst a discrete coverage allows for multiple variables on a shared domain.(In ISO 19123, such variables are termed "features", not to be confused with "sampling features" discussed below.)It is also worth noting that many, but not all, netCDF files containing CF data may well have CF data with a shared implied spatiotemporal domain, and such files may well map rather directly onto the coverage model, but as we have seen, this is not always the case, not least because multiple domain descriptions may appear in a given file.
Discrete coverages themselves can be further specialised into a set of more specialised coverages: sampling the domain with sets of points, a grid of points, or sets of curves, surfaces, or solids.
Of these, the most important (in terms of our CF data model) is the DiscreteGridPointCoverage (Fig. 12b).This relates a domain of grid points laid out in a grid to a range of values in GridValueMatrix.The grid can be defined in three different ways: as a set of grid points (with their own coordinates), as a "rectified" grid (a grid utilising a datum and coordinate vectors for which there is an affine transformation between the coordinates and those of an external coordinate reference system), or a referenceable grid which provides parametric relations between a position using a coordinate tuple and a position in a planetary reference frame using a specific coordinate reference system.The values can appear in a grid matrix, which defines how the sequence of values is laid out against the positioning exposed via the domain grid (in any of the three forms).
There is a clear correspondence between the CF dimension coordinate construct and an ISO coordinate reference system as used in a rectified grid, and between the underlying concepts of ISO parametric coordinate reference systems within a referenceable grid and a CF domain described using CF auxiliary coordinate constructs.This correspondence together supports the identification of an ISO grid (which itself carries little information apart from a name and a list of axes) with the abstract notion of a CF domain described by CF coordinate reference constructs.
Even with an ISO rectified grid, which has the easiest correspondence with CF, there are subtle but impor-

SC_CompoundCRS
Instants and periods are defined using temporal coordinate reference systems which include calendars etc and are defined in ISO19108.tant differences in the treatment of coordinates, probably the most important of which is that the CF equivalent of the ISO datum is often held in the standard name of the coordinate construct.For example, a CF coordinate construct with a standard name of height means the coordinate is with reference to the surface, i.e. the bottom of the atmosphere (distinct from other valid vertical coordinates such as height_above_reference_ellipsoid and height_above_sea_floor).These coordinates are all distinct geophysical quantities, with vertical datums of the surface, the reference ellipsoid, and the sea floor, respectively, though they all have the same canonical unit of measure (metres) and direction (values increase for locations further above the datum).

SC_CRS
Where a more precise specification of the datum may be needed (for example, the figure of the reference ellipsoid or the reference point for a latitude-longitude coordinate system where it is not the default of the intersection of the Equator and the Greenwich meridian), it can be supplied by the coordinate reference construct, not the standard name.This CF separation of grid mapping datum from coordinates adds value because changing the datum does not alter the geophysical nature of the coordinate and its interpretation.The partitioning is suitable and convenient for many purposes of data analysis, in which coordinate constructs are processed independently, without the need for awareness of a full ISO coordinate reference system (CRS).It arises from the generality of CF, in which spatiotemporal coordinates are used having a wider variety than in the geographic information system (GIS); non-spatiotemporal coordinates are also needed; and the data are often from idealised worlds (such as in climate models).
Whether grid cells are described directly or implied via referenceable or rectified grids, it is important to note that in the ISO world, the cells lie between the edges laid out by the coordinates, whereas by contrast the notion of a cell in CF -defined by the cell measure construct (Sect.4.6) -is more directly analogous (Fig. 13) to the FootPrint in an ISO coverage which represents the sample space associated with a grid point (Fig. 12b).There is also a richer set of semantics in CF associated with the cell method construct (Sect.4.8).
It might appear that some of the more complex geometries underlying CF fields which are expressed on domains sampled using the CF DSG features best be mapped onto other specialisations of DiscreteCoverages -this is the approach taken by Nativi et al. (2008) who use the DiscreteCurve-Coverage for mapping ISO coverages onto Unidata Common Data Model (Sect.5.3) profile and trajectory data.However, we would assert that the underlying semantics of CF as expressed by the concept of cells are generalisable to multidimensional sampling inside the DiscreteGridPoint coverage.This is in part why we see the CF sampling feature types as simply specialisations of the CF field, possibly with a specific storage pattern.This simple mapping between CF "domains" and ISO DiscreteGridPoint coverage is also the approach taken by the Open Geospatial Consortium (OGC) CF-netCDF extension standard (Sect.5.2 below) which describes another set of possible relationships between CF-netCDF and ISO 19123, albeit without the higher-level CF constructs we introduce here.ISO 19156 Observations and Measurements (International Standards Organisation, 2011) introduces sampling features with a range of geometric spatial properties (e.g.sampling along a curve, such as a trajectory).CF discrete sampling geometries can be mapped onto these sampling features, but it is important to note that ISO 19156 explicitly expects actual observation (or simulation) values to be subsampled.Thus, the observations are amenable to recording with more general discrete coverages, just as we have done here by treating all CF discrete sampling geometries as CF field constructs, optionally labelled with a feature type property to help understand the intended use of the axes.

The OGC CF-netCDF standard
The Open Geospatial Consortium standard introduced above (Dominico and Nativi, 2013) in the context of complex geometries presents their own CF data model: the CF-netCDF extension model.Their model differs from ours in four major ways: 1.It is not the complete CF version 1.6 (for example, it does not appear to include ancillary data variables).
2. Their model makes some elements of CF mandatory, in order to facilitate the ISO 19123 coverage interoperability, which is their target.
3. It is tied to the netCDF format.
4. It is constructed in order to map as closely as possible onto the ISO 19123 coverage model but without being faithful to CF; so, for example, it introduces the notion of a CF coordinate system including a notional Hori-zontalCRS, which is independent of explicitly identified horizontal and vertical coordinates (their Fig. 4).
By contrast, we have only introduced new concepts as abstractions where they help interpret and use CF itself (again, for example, in our case the domain and abstract coordinate).
The last of these points is of course subjective.We would argue that our approach is the most consistent with a faithful model of CF, but it is clear that CF itself currently admits a multitude of possible interpretative models as well as a multitude of correct (if limited and/or constrained) implementations such as this one.

The Unidata Common Data Model
The Unidata Common Data Model (CDM; Unidata, 2014a) is an abstract data model for scientific datasets that is a superset of the netCDF classic and enhanced data models (Sect.2).
In addition to netCDF, and similarly to the CF data model presented here, it consists of multiple layers: a data access layer, which handles data reading and writing, and merges the netCDF enhanced, OPeNDAP (Open-source Project for a Network Data Access Protocol, https://www.opendap.org)and HDF (Hierarchical Data Format, https://www.hdfgroup.org)data models to create a common application programming interface (API); a coordinate system layer, which handles the coordinates of data arrays; a feature type layer, which handles similar notions to those we express with CF field constructs and the CF sampling feature types; and a mature Java-based implementation which reads, manipulates, and writes the CDM sampling features.
The CDM data access layer has a broader scope than ours (being about more than just netCDF).If we consider that most of the CF standard as expressed in our data model is about handling coordinates, cells, and domains, then our CF data model corresponds to CDM coordinate system layer, with the various CF feature types of our CF data model field constructs corresponding to the CDM feature type layer.The cf-python software (Sect.6) corresponds to the CDM Java implementation, but our CF data model is intended to be useful outside the context of our cf-python application.
The CDM data access layer handles more data types (Fig. 14), in particular structures and sequences which allow more complex data types and the iteration through data of (a priori) unknown length.These data types are not yet required by CF-netCDF (Sect.2), so we do not consider them in our CF data model.Within the coordinate system layer, there is much closer correspondence between the CDM and our CF data model.
-A CF dimension or auxiliary coordinate construct maps to a CDM CoordinateAxis.-The datum and coordinate conversion components of a CF coordinate reference construct are components of a CDM CoordinateTransform.
-A CF coordinate reference construct maps to a CDM CoordinateSystem.
The last of these has one exception: a CDM CoordinateSystem must contain at least one CDM CoordinateAxis, whereas CF dimension and auxiliary coordinate constructs are optional in a CF coordinate reference construct.In other words, our CF data model can record a coordinate system datum in the absence of coordinate values.This is useful when the CF field construct properties, rather than its coordinates, define the extent of the data array.
A CDM CoordinateAxis may be subtyped into axes which can be specifically exploited by the sampling feature types in the CDM feature type layer, where there are significant differences from our CF data model.The CDM feature type implementation is discussed in Unidata (2014b); CDM implements feature collections as collections of point features in such a way that a profile, for example, is a collection of point features along a vertical line, and so a profile collection is a nested point feature collection.Similarly, a trajectory feature is a collection of points along a path through time and space, with collections which are nested point feature collections (Fig. 15).Other CDM feature types are built from this base.The CDM data model exposes these concepts directly.By contrast, our CF data model does not expose any of these ideas directly, with the interpretation left entirely to software implementations: our CF data model simply exposes the appropriate coordinates and their interpretation is either inferable from the nature of those coordinates or is made explicit via the feature type property, which is effectively a constraint with a label (Sect.4.1).

cf-python: a data model implementation
A key use of our data model is to enable the creation of wholly CF-compliant software, i.e. software that can represent and manipulate any CF-compliant dataset.Such software corresponds to one of the application boxes in Fig. 1. cf-python is a data analysis software library written for the Python programming language that implements the CF data model presented here for its internal data structures and so is able to process any CF-compliant dataset.It is, however, not strict about CF compliance so that partially conformant datasets may be ingested and their deficiencies corrected with the API.cf-python is open-source software and is free to download and install from the Python package index at https://pypi.python.org/pypi/cf-python. cf-python implements the data model constructs and their relationships exactly as shown in Figs.9-11, but we refer to the cf-python realisations of CF data model constructs as "objects" in order to distinguish between abstract concepts and their instantiated counterparts.cf-python (version 2.1) can read field objects from netCDF files or create new field objects, manipulate field objects in memory, and write field objects to netCDF files.Once a field object exists a range of operations are possible, including to create, delete, and modify a field object's data and metadata; select and subspace field objects according to their metadata; perform arithmetic, comparison, and other mathematical operations involving field objects; collapse axes by statistical operations; perform operations with date-time data; regrid fields to new domains using the Earth System Modeling Framework high-performance software infrastructure (O'Kuinghttons et al., 2016); and visualise field objects by interfacing with the cf-plot Python package, which is also open source and freely available at https://pypi.python.org/pypi/cf-plot.
All of these operations are "metadata aware", which means that parameters needed for an operation need not be fully specified by the user, provided that field objects have sufficient metadata to infer the parameters unambiguously.This is greatly facilitated by having a data model, because all standardised metadata are stored in a fully defined manner and so the required parameters may be inferred unambiguously.In practice, a field object's metadata may be incomplete, in which case the user should use the cf-python API to supplement the metadata.For example, the cf-python command h=f.regrids(g) will create a new field object h which has the data from field object f regridded to the latitudelongitude plane of the domain of field object g.If the domains of f and g are not sufficiently described for this operation, then an error will be raised that states which information is missing.The full API documentation is available as part of the cf-python installation, as well as via the Python package index.
How cf-python implements the data model may be seen by using the library to read the CF-netCDF file described in Fig. 3 and inspecting the field objects that it creates.The code listed in Fig. 16 demonstrates importing the library and reading this netCDF file within the interactive Python shell, in which a command is preceded by the >> > prompt and followed by any printed output.After importing the cf-python library (import cf), the netCDF file example_file.ncdescribed in Fig. 3 is read into the variable f, which is a list of the file's two field objectsone containing air temperature data and the other containing the vertical integral of atmospheric water vapour data.See Sect."Interpreting CF-netCDF files" for a discussion on why only two field objects were created from the 17 netCDF variables in the file.The one-line description of each field object shows the size and physical nature of the data array dimensions and the units of the data array values.In Fig. 17, the first of the two field objects is selected and labelled t, and a more detailed look at its metadata is generated with the dump function.This exposes the constructs of the data model which have been instantiated for this field object and >>> import cf >>> f = cf.read('example_file.nc') >>> f [<CF Field: air_temperature(atmosphere_sigma_coordinate(20), projection_y_coordinate(110), projection_x_coordinate( 106)) K>,<CF Field: atmosphere_mass_content_of_water_vapor(projection_y_coordinate(110), projection_x_coordinate( 106)) kg m2>] Figure 16.Reading a file using cf-python.
shows the field object's properties (standard_name and source instantiated from lines 54 and 74 of Fig. 3).The output describes one field object; four domain axis objects and their sizes, including one which is implied by the "time" CF-netCDF scalar coordinate variable; one cell method object indicating that each data array value is a time average constructed from daily samples; one field ancillary object describing the uncertainty of the data array values; four dimension coordinate objects, each one spanning a unique domain axis object; two multi-dimensional auxiliary coordinate objects for true latitude and true longitude coordinates (as required by the CF conventions when the horizontal dimension coordinates are not canonical geographical latitudes and longitudes); three domain ancillary objects utilised by the coordinate reference objects; two coordinate reference objects: a vertical, atmosphere sigma coordinate system which references the domain ancillary objects and the vertical dimension coordinate object, and a horizontal Lambert conformal conic coordinate system which references the horizontal auxiliary and dimension coordinate objects; and one cell measure object containing horizontal cell areas.
The one-to-one correspondence between the data model and cf-python's interpretation of CF may also be demonstrated by inspecting the objects from which field object t is composed.In Fig. 18, the constructs function is used to return all of these objects, each having a similar name in camel case to its CF data model counterpart (e.g. a domain axis construct is represented by a DomainAxis object).This output demonstrates that it is only the field object which stores information on the whole domain.For example, the latitude auxiliary coordinate object has a two-dimensional data array shape of 110 × 106 but does not record what these dimensions physically represent (Fig. 10).The cell method object does, however, store references to the domain axes to which it applies (Sect.4.8) -the "dim3" in this example refers to the size one domain axis object, which is identified by the field object alone as being a time axis by virtue of the "time" dimension coordinate object associated with it.
Whilst field object t contains at least one instance of every type of data model construct, it is more common for field objects to contain a subset of the possible data constructs.Figure 19 shows the detailed description of two other cf-python field objects.The first of these (p) is of medium complexity and contains only domain axis, cell method, and dimension coordinate objects.In this case, the cell method and time dimension coordinate objects collectively state that the data are 30-year averages of monthly minima.The second of the field objects (q) is minimally complex and contains no other data model constructs, yet is still CF compliant.In this case, the data array is scalar and there are no coordinates, so domain axes are not necessary.

Interpreting CF-netCDF files
Any variable in a CF-netCDF file can always be viewed as a data variable in addition to any metadata role it may have, simply by choosing to ignore any other variables that may reference it.For example, a variable that is named by the "coordinates" attribute of a data variable is always an auxiliary coordinate variable (Sect.3.3), but this metadata status is conferred solely by the "coordinates" attribute, so by ignoring it this variable also becomes a data variable.
When a CF-netCDF file is read, a decision must be taken as to which variables are the data variables.By default, cfpython assumes that only unreferenced variables are data variables that instantiate field objects (variables temp and total_wv in Fig. 3).It is possible, however, to override this default behaviour so that some or all referenced variables instantiate field objects in addition to instantiating other, metadata objects.For example, the variable PS in Fig. 3 will always create a domain ancillary object but may also, if requested, create an independent field object for surface air pressure.
An interesting situation arises if a netCDF file contains only CF-netCDF coordinate variables and their associated dimensions.It may be natural to assume that these coordinate variables define a single domain, but without the explicit links provided by a data variable the existence of a domain cannot be assumed by the software.These coordinate variables are not referenced by a data variable but they are explicitly defined as "coordinates".When reading such a file, cf-python by default creates no field objects (because only coordinates have been defined), and also no dimension coor-  dinate objects (as dimension coordinate constructs can only exist within a field construct; see Sect.4.1).As is the case with referenced variables, it is possible to override the default behaviour such that an independent field object is instantiated from each coordinate variable.This example highlights the issues arising from the absence of an explicit domain element in CF (Sect.3.2).

Evolution of a CF data model and cf-python
A number of new features have recently been introduced in version 1.7 of the CF conventions, published in September 2017, and there is no doubt that the CF conventions will continue to evolve and meet the needs of the scientific community for representing more types of data.Any CF data model will therefore have to adapt to future enhancements.This is why our CF data model was designed with a minimal www.geosci-model-dev.net/10/4619/2017/Geosci.Model Dev., 10, 4619-4646, 2017 number of simple constructs in mind (Sect.1.1).We suggest that such a design is more flexible and therefore more likely (but not guaranteed) to meet future requirements.A CF data model could guide the development of CF by providing a framework for ensuring that proposed changes fit into CF in a logical, rather than just a pragmatic, way.A proposed enhancement would be assessed to see if its new features map onto the existing data model.If they do, then the enhancement may be incorporated in the CF conventions with no change to the existing data model.
As an example, it is instructive to consider one of the enhancements accepted into CF at version 1.7, namely the ability to store a cell measure variable (Sect.3.5) in a different netCDF file to that of its data variable (http://cf-trac.llnl.gov/trac/ticket/145).This is the first occurrence in CF of an atomic dataset not being identical to a single file and poses important questions on how software might be expected to link multiple files and on the consequences of one of the files going missing, thereby losing information from the dataset.From a CF data model perspective, however, this is clearly just an artefact of the dataset encoding, from which the data model is independent by design.Whether a dataset is in one file, or spread across many files, the logical relationships between a field construct and its various metadata constructs, such as the cell measure construct, remain unchanged.
If a change does not map onto the existing data model and cannot be modified to do so, the data model will need to be modified to accommodate the new features.This modification will be either backwards compatible or backwards incompatible.The former (preferable) case occurs if the data model may be extended or generalised in some way that allows the new features but does not affect its existing constructs and relationships.The latter case occurs if a change is required that would affect the interpretation of existing datasets and the design of software built around the data model.Much effort has been already been put into avoiding backward incompatibilities in CF, so that older datasets are still parsable by newer software, but doing so is not a rule but a "best practice" that could be overridden if the community consensus were that the benefits in doing so outweighed any inconvenience.
It is the authors' intention to ensure that the cf-python software library is kept up to date with the latest version of CF conventions and the CF data model presented here.To facilitate this, work is underway to create a reference implementation of this data model, which in essence will be like cfpython but without any of its higher-level functionality (such as regridding methods).This reference implementation will then be imported back into cf-python to provide not only its data model representation but also a "read" method for mapping datasets onto field constructs and a "write" method for mapping field constructs to new netCDF files.The reference implementation will be easier to understand and maintain than cf-python, could be used by other Python packages for manipulating CF datasets, and also has the potential to be used as a test bed for proposed new features of CF.This reference implementation will be open source (like cf-python), so any interested parties may contribute to its maintenance and development.

Summary and conclusions
In this paper, we have presented a formal data model for the CF conventions, identifying the fundamental elements of CF and showing how they relate to each other.We have described the CF conventions in terms of their relationship to the physical world (real or simulated) and in terms of their netCDF encoding, and these steps led to our identifying the elements which contribute to a CF data model.The CF conventions themselves have been influenced by their netCDF encoding, and therefore our CF data model is indirectly influenced by netCDF, although it aims to be independent of the encoding.We have discussed the relationships of our CF data model to other data models which address the problem of storing data and metadata, and we have presented a software implementation of this CF data model capable of manipulating any CF-compliant dataset.We have described possible ways in which this CF data model and the cf-python software library may evolve over time.
It is important to note that our CF data model is a description of what CF is, rather than what it ought to be, either in our opinion or anyone else's.We believe that there is little doubt that a CF data model is of considerable value, and this has been recognised by the CF community, which highlights that a CF data model will aid future developments in the CF conventions and make it easier to create CF-compliant software (http://cf-trac.llnl.gov/trac/ticket/88).In addition, the existence of a data model can help resolve conflicts in the interpretation of the conventions document.For example, a discussion on the CF mailing list regarding the interpretation of CF-netCDF scalar coordinate variables was resolved with some assistance from a developmental version of the data model (http://cf-trac.llnl.gov/trac/ticket/104).There are other discussions on unintended ambiguities that have not been resolved, however, and this is unsatisfactory for those who need to manipulate datasets or create software for that purpose.
Creating an explicit data model before the CF conventions were written would arguably have been preferable.A data model created a priori increases the likelihood that the problem space (i.e.storing and manipulating data and metadata) is fully spanned and encourages coherent implementations, which could be file storage syntaxes or software codes, the latter being a stated goal of CF.For example, in CF-netCDF, horizontal and vertical coordinate reference systems are described with very different structures -the grid mapping variable and formula_terms attribute, respectively -a situation that would likely not have occurred if a comprehensive CF data model already existed.Writing a CF data model a pos-teriori clearly cannot bring about all of these benefits, as the coverage of the problem space and file storage syntax is a given, but it can still be of use to software implementations and future developments in the conventions.
We believe that the data model proposed here is a complete and correct description of CF, because we have yet to find a case for which our implementation in the cf-python library fails to represent or misrepresents a CF-compliant dataset.Moreover, the development of cf-python proves that is possible to implement our CF data model.We consider that our CF data model is simpler and more flexible than other such models, because it defines a small number of general constructs rather than many specialised ones.While the latter approach is closer to an object-orientated software implementation, our aim is to describe CF in a way which is independent of any software.
If this CF data model were to be accepted by the community as a formal part of the CF conventions, then any future enhancements would have to be incorporated into the data model as part of the public discussion that leads to the acceptance of every enhancement.Version 1.7 of the CF con-ventions has been recently published and it is the authors' intention to review all of the new features for compatibility with this CF data model.As these enhancements have already been finalised, any conflict will necessarily force a change in the data model.Once up to date with version 1.7, the data model may then be considered in parallel with the discussions on enhancements for subsequent releases.Structural differences between different versions of the CF conventions would be plain to see if each release contains a data model, thus making it easier to write software that can cope with any backward incompatibilities that may have been introduced.
Figure1.The benefits of having a CF data model.The CF-netCDF conventions (CN) rely on the netCDF data model (NC) and, at present, a software application is forced to make its own interpretation of the CF-netCDF conventions -an interpretation that is likely to be different from that of other applications.The aim of this paper is to propose a comprehensive CF data model that provides a consistent interpretation of the conventions, thereby facilitating compatibility across applications that adopt it.This by no means precludes a variety of software implementations, as there is still considerable flexibility in mapping data model elements onto the data objects needed for a particular application.
is a special kind of class-E (but class-E is not shown on the diagram) class-B class-C Instances of class-C can be included in an instance of class-B but cannot exist independently class-B class-F Instances of class-F can be included in an instance of class-B or can exist independently class-B class-D An instance of class-B is associated with an instance of class-D but class-D is independent of class-B

Figure 4 .
Figure 4.An example domain defined by three dimensions, one of which is single valued (height).
Figure9.The nine constructs of our CF data model (corresponding to the green layer in Fig.1) described using UML (Appendix A).In this and all other CF data model diagrams, the CF data model constructs are labelled with the "construct" stereotype to distinguish them from other data model elements which appear in other diagrams as green boxes with no label.The field construct corresponds to a CF-netCDF data variable.Relationships between other constructs and CF-netCDF are given in Figs.10 and 11.The domain provides the linkage between the field construct and the constructs which describe measurement locations and cell properties.It is not a construct of the data model (see Sect. 4.1) but an abstract concept that is useful for understanding it.Similarly, it is useful to define an abstract generic coordinate construct that can be used to refer to coordinates when the their type (dimension or auxiliary coordinate construct) is not an issue.

Figure 11 .
Figure11.The relationship between coordinate reference and domain ancillary constructs (Sect.4.4 and 4.5) and CF-netCDF described using UML (Appendix A).A coordinate reference construct is defined either by a grid mapping variable or a formula_terms attribute of a CF-netCDF coordinate variable.The coordinate reference construct is composed of generic coordinate constructs, a datum, and a coordinate conversion formula.The coordinate conversion formula is usually defined by a named formula in the CF conventions.A domain ancillary construct term of a coordinate conversion formula is defined by a CF-netCDF data variable or a CF-netCDF generic coordinate variable.

Figure 12 .
Figure 12.Key concepts within the ISO 19123 view of coverages and the associated coordinate systems: a) as applied to a specific coverage: the discrete coverage, which relates a domain of objects to a a set of attribute values; b) an expanded view of the relationship of discrete grid point coverages to grid cells, grid footprints, and three different ways of thinking about the grids (set of coordinates, rectified, and referenceable grids).Note the comment box defining the two-letter labels of each element (e.g.CV means that this elements comes the the ISO 19123 data model).See also figure 9 in Dominico and Nativi (2013).Further relationships and details are discussed in the text.

Figure 12 .Figure 13 .
Figure12.Key concepts within the ISO 19123 view of coverages and the associated coordinate systems: (a) as applied to a specific coverage: the discrete coverage, which relates a domain of objects to a set of attribute values; (b) an expanded view of the relationship of discrete grid point coverages to grid cells, grid footprints, and three different ways of thinking about the grids (set of coordinates, rectified, and referenceable grids).Note the comment box defining the two-letter labels of each element (e.g.CV indicates that this element comes from the ISO 19123 data model).See also Fig.9in Dominico andNativi (2013).Further relationships and details are discussed in the text.

Figure 19 .
Figure 19.Examples of cf-python field objects of medium and minimal complexity.

Table 1 .
The UML class associations used in this paper.See Fig.A1for a worked example.

Table 2 .
The elements of the CF-netCDF conventions, a brief description of each, and the section in which it is described in more detail.The relationships to netCDF entities are shown in Fig.8.

Table 3 .
The constructs of our CF data model, a brief description of each, and the section in which it is described in more detail.The relationships between the constructs and CF-netCDF elements are shown in Figs.9-11. is usually defined either by a netCDF dimension or by a scalar coordinate variable, which implies a domain axis of size one (Sect.3.3).The field construct's data array spans the domain axis constructs of the domain, with the optional exception of size one axes, because their presence makes no difference to the order of the elements.Hence, the data array may be zero-dimensional (i.e.scalar) if there are no domain axis constructs of size greater than one.
Figure 14.Key characteristics of the Unidata Common Data Model.There is a wider variety of fundamental data types than is supported by the netCDF classic data model; and the coordinate system includes the option of coordinate axes of specific types for use in the feature types, which limits the flexibility of the CDM data model.
The cf-python class instances which correspond to CF data model constructs.