Coupling global models for hydrology and nutrient loading to simulate nitrogen and phosphorus retention in surface water – description of IMAGE–GNM and analysis of performance

. The Integrated Model to Assess the Global Environment–Global Nutrient Model (IMAGE–GNM) is a global distributed, spatially explicit model using hydrology as the basis for describing nitrogen (N) and phosphorus (P) delivery to surface water, transport and in-stream retention in rivers, lakes, wetlands and reservoirs. It is part of the integrated assessment model IMAGE, which studies the interaction between society and the environment over prolonged time periods. In the IMAGE–GNM model, grid cells receive water with dissolved and suspended N and P from upstream grid cells; inside grid cells, N and P are delivered to water bodies via diffuse sources (surface runoff, shallow and deep groundwater, riparian zones; litterfall in ﬂoodplains; atmospheric deposition) and point sources (wastewater); N and P retention in a water body is on the basis of the residence time of the water and nutrient uptake velocity; subsequently, water and nutrients are transported to downstream grid cells. Differences between model results and observed concentrations for a range of global rivers are acceptable given the global scale of the uncalibrated model. Sensitivity analysis with data for the year 2000 showed that is a major for and


Introduction
Eutrophication, induced by a surge in anthropogenic nutrient loads to the global freshwater domain (e.g., rivers, lakes and estuaries), has an increasingly negative impact on aquatic ecosystems. In order to ameliorate and reverse this trend, ecological principles must be integrated into environmental management and restoration practices. These actions require a thorough understanding of the interactions between various human-induced disturbances (e.g., climate change, land use change, nutrient loadings and hydrology regulation) and their effects on freshwater systems (Stanley et al., 2010). To fully grasp the human impact on biogeochemical cycles, studies must collectively consider the biogeochemical turnover and exchange among the atmosphere, and the aquatic and terrestrial ecosystems.
Numerical models can assess the interaction between multiple processes in various river basin environments. They can furthermore improve predictions for the regional to global nutrient flux from the land to the ocean. Integrated assessment models (IAM) have established themselves as powerful tools to study future development of complex, large-scale environmental and sustainable development issues. There are Various other model approaches exist (Bouwman et al., 2013c). The widely used regression models lump the combined effects of nutrient transformations in the continental system into a set of parameters and equations, which can ultimately predict the drainage basin discharge of various geochemical species (e.g., dissolved inorganic and organic, and particulate N, P, C; Seitzinger et al., 2005;Seitzinger et al., 2010). For our purposes, these lumped regression models have limited value, because they both ignore spatial variability of sources and sinks within river basins, and amalgamate all processes in the river continuum. They thus cannot elucidate the nonlinear behavior that results from the interplay between nutrient sources and biogeochemical processes. The SPARROW (SPAtially Referenced Regression On Watershed attributes;Smith et al., 1997;Alexander et al., 2008) model and similar hybrid approaches correlate measured stream nutrient fluxes with spatial data on nutrient sources and landscape characteristics. However, the disadvantage of such an approach is that only a limited time period is covered, while many scientific questions regarding the anthropogenic pressures on the nutrient cycles require prolonged time periods. On the other extreme, there is a range of continuous or event-based distributed watershed-scale models available, which simulate all the components of a landscape, with the hydrology as the basis of calculations. An inventory of such mechanistic models was presented by Borah and Bera (2003). These models usually focus on N while ignoring P and tend to require extensive data that may be difficult to obtain at the spatiotemporal scales of human-climate interactions, and thus are less appropriate to implement in IMAGE-GNM.
In summary, IMAGE-GNM is a global, spatially explicit model, which uses hydrology as the basis for describing N and P delivery to surface water and in-stream transport and retention. It is part of the IAM IMAGE, and used to study the impact of multiple environmental changes over time frames, which capture the mutual feedbacks between humanity and the Earth system. In this manuscript, we compare the model behavior against observations for a number of rivers, and test its sensitivity to a range of model parameter variations to analyze the impact of changing nutrient loading, climate and hydrology.

General aspects
The IMAGE model utilizes historical data for testing the model behavior, and projections to describe direct and indirect drivers of future global environmental change. Most of these drivers (such as technology and lifestyle assumptions) are used as input in various subcomponents of IMAGE such as GNM (Fig. 1). Clearly, the exogenous assumptions made on these factors need to be consistent. To ensure this, so-called storylines are used, brief descriptions about how the future may unfold, that can be used to derive internally consistent assumptions for the main driving forces of each IMAGE module. Important categories of scenario drivers include demographic factors, economic development, lifestyle and technology change. Among these, population and economic development form a special category as they can be dealt with in a quantitative sense as exogenous model drivers.
The geographical resolution of IMAGE 3.0 is 26 socioeconomic world regions (Stehfest et al., 2014). These regions are selected given their relevance for global environmental problems and a relatively high degree of internal coherence. In the Earth system, the key geographic scale is a 0.5 • × 0.5 • grid for plant growth, land cover, carbon, nutrient and water cycles. In terms of temporal scale, both systems are run at an annual time step, focusing on long-term trends to capture important inertia aspects of global environmental problems such as simultaneously changing climate and various human activities. Within the Earth system, much shorter time steps are used for water, crop and vegetation modeling. For many applications the IMAGE model deliberately runs over the historical period of 1970 until present-day in order to test model dynamics against key historical trends and then up to 2050, depending on the focus of the analysis. IMAGE-GNM is integrated in the IMAGE model framework, as it has to account for all the drivers that determine the nutrient emissions from point and diffuse sources and their transport. IMAGE-GNM is therefore a distributed model with temporal resolution of 1 year, and a spatially explicit resolution of 0.5 by 0.5 degrees.  Figure 2. Scheme of the model framework with PCR-GLOBWB and IMAGE and the data flows between the models. IMAGE provides land cover and soil budgets for N and P and IMAGE-GNM outputs the nutrient delivery to surface water via surface and subsurface runoff (see Sect. 2.4.2 and 2.4.3) (Fig. 2). IMAGE distinguishes grid cells with natural vegetation or agriculture. Within each agricultural grid cell IMAGE computes distributions of seven crop groups that are aggregated in IMAGE-GNM to larger groups (pastoral grassland, grassland in mixed systems, wetland rice, legumes and upland crops). The soil N budget (N budget ) is calculated for each of these groups and then aggregated to the level of 0.5 • × 0.5 • grid cells for individual years as follows:

PCR-GLOBWB
where N fix is biological N fixation (kg), N dep is atmospheric N deposition (kg), N fert is application of synthetic N fertilizer (kg), N man is animal manure (kg), N withdr is N removal from via crop harvesting, hay and grass cutting, and grass grazed by animals (kg), and N vol is ammonia (NH 3 ) volatilization (kg). The N budget is prone to erosion, leaching or denitrification, or can accumulate in the soil. Following the approach of Bouwman et al. (2013d), the P budget is assumed to depend on erosion, and soil accumulation. P inputs for the soil budget are fertilizer and animal manure, and outputs are crop and grass withdrawal. The data exchange between PCR-GLOBWB and IMAGE-GNM is presented in Fig. 2. Spatial land cover distributions from IMAGE and global climate data from ERA-40 reanalysis (Uppala et al., 2005) are used in PCR-GLOBWB for computing the water balance, runoff and discharge for each year. For each grid cell, IMAGE-GNM provides the delivery of N and P to water bodies via diffuse sources (surface runoff, shallow and deep groundwater, riparian zones) and point sources (wastewater) (Figs. 3 and 4). Grid cells receive water with dissolved and suspended N and P from upstream grid cells, and from diffuse and point sources within the grid cell. In each grid cell, N and  Figure 3. Scheme of the flows of water and nutrients, and retention processes within a grid cell. P retention in a water body is calculated on the basis of the residence time of the water and nutrient uptake velocity, and subsequently, water and nutrients are transported to downstream grid cells. Discharge is routed to obtain the accumulated water and nutrient flux in each grid cell, through streams, rivers, lakes, wetlands and reservoirs (Fig. 4). The various submodels for hydrology, spatially explicit nutrient delivery patterns and in-stream retention (Fig. 3), used within IMAGE-GNM are parameterized independently. Furthermore, these parameters are not calibrated in order to better understand the model behavior, identify the lacunae in the data used, and discern the influence of the various processes considered in the model. Instead, the sensitivity of different model outputs to changes in values of input data and model parameters is analyzed in order to explore our model and data.
Although part of the IMAGE framework, GNM can also be used as a stand-alone version, provided that all the input data are in the correct format. For example, land cover data and soil N budgets from various modeling groups could be used (Van Drecht et al., 2005;Fekete et al., 2011). Here we use an update of the nutrient data covering the period 1900-2000 presented by Bouwman et al. (2013d). Also, output from different hydrological models (e.g., Alcamo et al., 2003;Fekete et al., 2011) could be compared.
IMAGE-GNM is written in Python 2.7 code. The complete code is available in the Supplement.

Water balance
The land surface in PCR-GLOBWB is represented by a topsoil (0.3 m thick or less) and a subsoil (1.2 m thick or less). Precipitation falls as rain if air temperature exceeds 0 • C, and as snow otherwise. Snow accumulates on the surface, and . Scheme of the routing of water (with N and P) in a landscape with streams, rivers, lakes, wetlands and reservoirs; each type of water body within a grid cell is defined by an inflow or discharge, depth and area. Floodplains may be temporarily or permanently flooded. melt is temperature controlled. Potential evapotranspiration is broken down into canopy transpiration and bare-soil evaporation, which are reduced to an actual evapotranspiration rate based on soil moisture content. Vertical transport in the soil column arises from percolation or capillary rise, depending on the vertical hydraulic gradient present between these layers.
Precipitation and temperature are from New et al. (2000) and downscaled to daily values using the ERA-40 reanalysis (Uppala et al., 2005). Precipitation and temperature were fed directly into the model whereas secondary variables (vapor pressure, wind speed, cloud cover) were used to compute reference potential evapotranspiration using the Penman-Monteith equation according to guidelines of the Food and Agriculture Organization of the United Nations (FAO) (Allen et al., 1998). For the overlapping period 1960-2001, the actual sequence of ERA-40 years was used.
Water drains from the soil column and is delivered as specific runoff to the drainage network, consisting of direct runoff, interflow and base flow. PCR-GLOBWB simulates runoff and converts it to regulated discharge (i.e., including reservoirs; water extraction is ignored), which is used to simulate waterborne nutrient transport. First, total runoff q tot (m yr −1 ) is split into surface runoff (q sro , m yr −1 ) and excess water flow (q eff , m yr −1 ): where f qsro is the fraction of surface runoff with respect to total runoff. Surface runoff represents a large proportion of total runoff in locations where drainage into soils is restricted (e.g., urban areas with sealed surfaces, areas covered with impermeable topsoil, and locations with a steep topography) and is represented as Surface runoff is assumed to not be limited (f qsro (texture) = 1.0) in soils with very fine topsoil texture, whereas for loam and sandy loam, and for coarse sand and peat the value f qsro (texture) is adjusted to 0.75 and 0.25, respectively. The slope-runoff classification for unconsolidated sediments is implemented following Bogena et al. (2005): where S is the slope in m km −1 . Since this function is nonlinear, f qsro (slope) is the median value of all 90 m × 90 m cells within each 0.5 • × 0.5 • grid cell. Land use and soil texture can also influence the surface runoff, and these are implemented via the dimensionless factors f qsro (texture) and f qsro (land use), respectively (Velthof et al., 2007(Velthof et al., , 2009. The soil map used shows dominant soil texture, and has no bare rock class. In areas with bare rock such as in mountainous regions, slopes are generally steep, and Eq. (4) yields high values for f qsro (slope) and thus for f qsro .
Water stagnation may occur in flat land (slope < 20 m km −1 ) where soils are saturated based on the Improved Arno Scheme (Todini, 1996;Hageman and Gates, 2003). Soils that are (semi-) permanently saturated are identified as poorly drained areas and are associated with the occurrence of bogs and peat lands. Also, where percolation at the interface between soil and the groundwater reservoir is impeded (e.g., in the case of permafrost), water can stagnate and drain as topographically driven saturated interflow.
When water infiltrates, it can either flow laterally to ditches and streams or vertically to groundwater. IMAGE-GNM implements two groundwater compartments, following Van Drecht et al. (2003), De Wit andPebesma (2001) and De Wit (2001) (Fig. 3). The shallow groundwater system comprises the top 5 m of the saturated zone where water is retained over short residence times and can either enter the local surface water at short distances (< 1 m) or infiltrate into the deep groundwater system. A 50 m thick deep groundwater layer (Meinardi, 1994), is located below the shallow groundwater system and significantly contributes to the runoff. The water residence time in the deep groundwater system is much higher than that of the shallow groundwater system, as it flows more slowly at greater depths and drains into the fluvial system at greater distances (> 1 km). IMAGE-GNM assumes no deep groundwater presence (i) in areas with non-permeable, consolidated rocks; (ii) in sediments underlying surface waters (rivers, lakes, wetlands, reservoirs); and (iii) in coastal lowlands (< 5 m above sea level) where (artificial) drainage or a high groundwater level persists (Bouwman et al., 2013a).
The excess water flow q eff (Eq. 5) splits into interflow through the shallow groundwater system (q int , m yr −1 ) and deep groundwater runoff (q gwb , m yr −1 ) as follows: The partitioning f qgwb (p) of the excess water flow q eff between these two systems (Fig. 3) is based on the effective porosity (p) of the parent material (Table 1). The deep layer (if present) is assumed to have the same characteristics as the surface layer. IMAGE-GMN assumes that shallow groundwater interflow moves to the fluvial system via riparian zones (Fig. 3), except in (fractions of) grid cells with wetlands, lakes or large streams, where riparian zones are bypassed. Although riparian zones may only account for a small percentage of the drainage basin, they are critical control points for groundwater and N fluxes within many basins (Vidon and Hill, 2006). Riparian zones along small streams have long ecotone lengths within drainage networks, and may process groundwater N at faster rates than larger nearby water bodies (Bouwman et al., 2013a).

Vegetation and land cover
Vegetation effects are taken into account by partitioning the land surface by fraction into different types. Similarly, spatial variations in soil properties can be accounted for by considering effective values for each of these vegetation types. Soil characteristics are assumed to be constant under changing land cover, except for soil total available water capacity (tawc); the relative distribution of tawc varies with changing root depth distributions based on Canadell et al. (1996). All other soil parameters are from the FAO Digital Soil Map of the World (FAO, 1991) and the World Inventory of Soil property Estimates (WISE) data from the International Soil Reference and Information Center (ISRIC) World Soil Information (Batjes, 1997(Batjes, , 2002. Lithological properties (such as hydraulic conductivity) are derived from a global lithological map (Dürr et al., 2005).
Similar to earlier implementations of PCR-GLOBWB, vegetation parameters are taken from the Olson classification of the global land cover characterization (GLCC) data set with a resolution of 30 arcsec and values assigned using the parameter data set of Hagemann et al. (1999). The parameterization is adjusted to the reconstruction of agricultural land cover for 1900-2000 with 5-year time steps derived from the IMAGE model (Bouwman et al., 2013d) based on historical data (Klein Goldewijk et al., 2010 in order to achieve consistency between the simulated hydrology and imposed land use. The land cover reconstruction for the 20th century specifies the fractions of arable land and grassland within each  0.5 • × 0.5 • grid cell. To combine this information with the Olson classification, three separate maps at the original resolution of 30 arcsec were created, including (i) Olson classes that were assumed to represent semi-natural vegetation and that were spatially extrapolated per Holdridge life zone (Holdridge, 1967); (ii) Olson classes representing cropland; and (iii) Olson classes representing grassland.
For the reconstructed land cover under the two agriculturally managed conditions, i.e., crops and pasture, all 30 arcsec cells within a 0.5 • × 0.5 • cell are ranked in order of decreasing suitability from 0 to 1. This is achieved by first delineating their current extent in the GLCC and ranking on the basis of slope, computed from the Hydro1k database (Verdin and Greenlee, 1996). Next, the adjoining cells are ranked on the basis of the slope parallel distance starting from the delineated areas. These rank orders are then normalized, values near zero indicating the most suitable locations, one indicating the poorest locations, and used to match the IMAGE derived fractions for each 0.5 • × 0.5 • cell. In this procedure, cropland has priority, followed by grassland. Any remaining areas are subsequently filled with semi-natural vegetation types. On the basis of the resulting patched land cover, the land cover parameterization for PCR-GLOBWB was then derived.

Drainage network
Drainage density is computed from the Hydro1k data set (Verdin and Greenlee, 1996). The drainage network is based on the DDM30 flow direction map of Döll and Lehner (2002) and the lake characteristics taken from the Global Lakes and Wetlands Database version 1 (GLWD1) product (Lehner and Döll, 2004). Reservoirs are from the Global Reservoir and Dam (GRaND) database (Lehner et al., 2011) and introduced dynamically on the basis of the reported construction year.
The water level in lakes is constant, as the through flow will increase with increasing discharge. The water travel time is determined by the discharge and the volume of the water body. Assuming that flooding occurs once a year and that all river discharge follows the main channel, the travel time in a river with floodplains is determined as follows: where τ is the travel time (year), V is the volume of the water body (including river bed) (m 3 ), Q is the discharge (m 3 yr −1 ) and Q f is the discharge into the flooded area (m 3 yr −1 ). While the simulated discharge includes the regulating effect of reservoirs, consumptive water use has not been included as it is difficult to identify its source (groundwater, surface water) and to quantify its spatial distribution with certainty. Water bodies such as lakes and reservoirs can extend over several 0.5 • × 0.5 • grid cells and are included if their volume exceeds that of the channel within a cell. Where more than one reservoir is located within the same grid cell, they are merged and the combined storage and volume assigned to the dominant reservoir. At the start of the simulation, in 1901, 107 out of a total of 132 reservoirs of the GRaND data set are included as 88 spatially individual water bodies, corresponding to 78 % of the reported total volume of 16.4 km 3 . 98 % of the reported total volume of 5848.4 km 3 . No demand is imposed on the reservoirs and by default they are assigned the purpose of hydropower generation. In absence of pricing generation at the global scale (Haddeland et al., 2006;Adam and Lettenmaier, 2008), this results in an operation that maximizes the available potential energy. In this case, this conforms with 75 % of the maximum storage capacity in absence of detailed global data. The remaining 25 % are reserved to buffer inflow for flood control purposes. Reservoir release is linearly scaled to storage when reservoir storage falls below 30 % of the available capacity. This reduced outflow also results in a realistic, gradual filling of reservoirs after completion of dam construction.

Nutrient delivery to surface water
Surface and subsurface runoff are calculated from the soil N and P budgets on the basis of the hydrological flows provided by PCR-GLOBWB. Other nutrient sources that are directly delivered to surface water included in IMAGE-GNM are wastewater from urban areas, aquaculture, allochthonous organic matter, weathering and atmospheric deposition.

Nutrients directly delivered to surface water
N and P inputs from wastewater for the 20th century are from Morée et al. (2013), and those from freshwater aquaculture are calculated using the country-scale model estimates of Bouwman et al. (2013b) for finfish  for shellfish using data for the period 1950-2000 from FAO (2013); data indicate that prior to 1950 aquaculture production was negligible. N and P emissions from aquaculture are allocated within countries using three weighing factors, i.e., population density, presence of surface water bodies, and mean annual air temperature. For population density, all grid cells with no inhabitants and those with more than 10 000 inhabitants km −2 are excluded; around an optimum density of 1000 inhabitants km −2 , a steep parabolic function on the left and less steep on the right are used to calculate the weighing. Lakes, reservoirs, rivers and wetlands have the maximum weight for water bodies, and floodplains and intermittent lakes only half of that; all other types have a weight of zero. Grid cells with mean annual air temperature <0 • C are excluded for aquaculture. The three weighing factors are combined by multiplication to obtain the overall weight (range = [0,1]). Then all grid cells with overall probability < 10 % are excluded for aquaculture, yielding the map for allocation for all years. Subsequently, the country production for shellfish and finfish are allocated separately. Grid cells with fish production less than a threshold are excluded for that particular year, and the remaining grid cells are used to allocate the N and P emissions from shellfish and finfish based on the weighing map.
Allochthonous organic matter input to surface water is an important flux in the global C cycle (Cole et al., 2007). This could be an important source of nutrients, but so far its magnitude has not been investigated. Here, estimates of NPP from IMAGE for wetlands and floodplains are used. Part of annual NPP is assumed to be deposited in the water during flooding, and where flooding is temporary, the litter from preceding periods is assumed to be available for transport in the flood water. The mass ratio of litter to belowground inputs of organic matter ranges from 30 : 70 to 70 : 30 (Vogt et al., 1986;Trumbore et al., 1995); 50 % of total NPP is assumed to end in the surface water. N and P inputs to the water are estimated based on a C : N ratio of 100 and a C : P ratio of 1200 (Vitousek, 1984;Vitousek et al., 1988).
The calculation of P release from weathering is based on a recent study , which uses the lithological classes distinguished by Dürr et al. (2005). The lithological classes are available on a 5 by 5 min resolution; hence, the weighted average P concentration within each 0.5 • × 0.5 • grid cell is calculated, and the P RivLoadWeath (kg P yr −1 ) is computed as follows: where C PWeath (g m −3 ) is the background concentration specified for each lithological class (Table 1) and derived from river runoff data, q tot is the total runoff (m yr −1 ), A gridcell is the land area (m 2 ) in the grid cell considered, SS corr is a correction factor for soil shielding, E a,w is the activation energy (J mol −1 ) (Table 1), K the local mean annual air temperature (Kelvin) and R the molar gas constant (8.3144 J mol −1 K −1 ). The soil shielding correction SS corr is a correction factor of 0.1 leading to a 90 % reduction for FAO soil units (FAO/Unesco, 1988) Ferralsols, Acrisols, Nitosols, Lixisols, Gleysols (soils with hydromorphic properties) and Histosols (organic soils). For all other soils SS corr = 1 (no reduction). With this approach, regions with the same lithology but with more precipitation have higher P-weathering losses than regions in dry climates. Atmospheric N deposition to water bodies is from the ensemble of reactive-transport models for the year 2000 (Dentener et al., 2006), and the years before that were made by scaling the deposition with grid-based emissions of ammonia (Bouwman et al., 2013d). The deposition in floodplains, wetlands and river channels is ignored, because it is already part of the soil N budget, and does not need to be accounted for in periods of flooding.

Surface runoff
IMAGE-GNM distinguishes two surface runoff mobilization pathways for nutrients, i.e., losses from recent nutrient applications in the form of fertilizer, manure or organic matter (N sro,rec , P sro,rec ) (Hart et al., 2004), and a memory effect (N sro,mem , P sro,mem ) related to long-term historical changes in soil nutrient inventories (McDowell and Sharpley, 2001;Tarkalson and Mikkelsen, 2004): Estimates of soil loss by rainfall erosion from Cerdan et al. (2010) based on a large database of measurements were used as a basis for calculating P sro,mem and N sro,mem . The approach presented by Cerdan et al. (2010) based on slope, soil texture and land cover type were used to estimate country aggregated soil-loss rates for arable land, grassland and natural vegetation. Soil loss from peat soils was assumed to be low (equal to fine texture). These estimates were then adjusted to obtain the mean erosion loss estimates for Europe (360 t of soil per km 2 for arable fields, 40 t km −2 for grassland and 15 t km −2 for natural vegetation). The model was then applied to all grid cells of the world. For global grasslands this yields an erosion rate of 60 t of soil per km 2 , which exceeds the European rate by 50 % due to larger erosivity of grasslands in especially tropical and (semi-)arid climates.
As the model keeps track of all inputs and outputs in the soil P budget, the actual P content can be calculated. The initial P stock for the year 1900 in the top 30 cm is taken from Yang et al. (2013). All inputs and outputs of the soil balance are assumed to occur in the top 30 cm; the model replaces P enriched or depleted soil material lost at the surface by erosion with fresh soil material (with the initial soil P content) at the bottom. For N the soil organic C content, which is assumed to be constant over time, is used as a basis to calculate N in eroded soil material using land-use-specific C : N ratios (soil C : N for arable land is 12, for grassland 14 and for soils under natural vegetation 14) (based on Brady, 1990;Batjes, 1996;Guo and Gifford, 2002;McLauchlan, 2006). Hence, with changing land use, the N content in soil erosion loss will also change. P sro,rec and N sro,rec are calculated from the N and P input terms (Eq. 1) on the basis of f qsro (Eq. 4). For N the equation is where f cal is a correction coefficient of 0.3 to match the N runoff results of the Miterra model (Velthof et al., 2007(Velthof et al., , 2009).

Subsurface nitrogen removal and delivery
Subsurface transport of P is neglected, as P is easily absorbed by soil minerals; leaching of P may occur only in P-saturated soils with long histories of heavy over-fertilization; below the saturated soil layer, P will be absorbed into the minerals occurring there, which are low in P. All the positive values of the soil N budget (Eq. 1) are subjected to leaching. Leaching from the top 1 m of soil (or less for thinner soils) is a fraction of the soil N budget excluding the N lost by surface runoff (f leach,soil ; Van Drecht et al., 2003): where f leach,soil is The fraction of N lost by denitrification (f den,soil ) complements f leach,soil (f den,soil = 1−f leach,soil ). f text , f drain and f soc represent factors that address the soil texture, aeration and soil organic carbon (C) content, respectively (Table 2). Finetextured soils are more susceptible to reach and maintain anoxia, which favors denitrification, as they are characterized by higher capillary pressures and hold water more tightly than sandy soils. Denitrification rates tend to be higher in poorly drained than in well-drained soils (Bouwman et al., 1993). The soil organic C content is used as a proxy for the C supply, which can have a direct impact on the soil oxygen concentrations. f landuse is the land use effect on leaching, where arable land has a value of 1, and grassland and natural vegetation a value of 0.36 (Keuskamp et al., 2012). The factor f climate (-) combines the effects of temperature, water residence time, and NO − 3 in the root zone on denitrification rates. f climate is the product of the temperature effects on denitrification (f K , -) and the mean annual residence time of water and NO − 3 in the root zone (T r,so , yr): The temperature effect f K follows the Arrhenius equation (Firestone, 1982;Kragt et al., 1990;Shaffer et al., 1991): where E a,d is the activation energy (74830 J mol −1 ), K the mean annual temperature (Kelvin) and R is the molar gas constant (8.3144 J mol −1 K −1 ). T r,so is calculated via: where tawc (m) is the total available water capacity for the top 1 m (or less if thinner) of soil and q eff is described in Eq. 5. Based on the negligible retardation of NO − 3 , the water and NO − 3 residence times are assumed to be the same. Soils used for agricultural crops in dry regions with T r,so < 1 receive a T r,so value of 1.0 assuming that irrigation is required to grow crops in these locations.
Arid regions under grassland or natural vegetation have long residence times according to Eq. (14), and results in values of f climate and f den,soil equal 1, implying that denitrification removes all the N. This representation is not realistic, since N can accumulate in the vadose zone below the root zone as nitrate (Walvoord et al., 2003), and can escape via surface runoff, ammonia-N volatilization and denitrification (Peterjohn and Schlesinger, 1990). It is not possible to quantify the relative contribution of each process (Peterjohn and  Schlesinger, 1990), but it is clear that only a negligible part of N surpluses in arid climates is lost by denitrification. Denitrification was thus neglected from the fate of N surplus in soils receiving an annual precipitation of < 3 mm and overlain with grasslands and natural vegetation. For the year 2000, N surplus in the 3100 Mha of global arid lands was 20 Tg. The N concentration C N in the excess water leaching from the root zone (depth z = 0) is represented by the ratio of leached N over q eff (Eq. 5): The groundwater N concentration varies according to the historical year of infiltration into the saturated zone and the denitrification (including anammox) during groundwater advection (Böhlke et al., 2002;Van Drecht et al., 2003). The time available for denitrification is represented by the mean travel time T r,aq , which is the ratio of the specific groundwater volume and the water recharge: where D is aquifer thickness (m) and can either be for shallow groundwater (D sgrw = 5 m) or for deep groundwater (D dgrw = 50 m) following Meinardi (1994). q inflow is either the shallow groundwater recharge (q int , m yr −1 ) or deep groundwater recharge, (q gwb , m yr −1 ). The vertical drainage of the shallow groundwater feeds the deep groundwater (Fig. 3). The vertical flow distribution for the shallow system is uniform; therefore, the travel time can be equated to the mean travel time. In contrast, travel times for lateral flows to the fluvial system vary considerably. The travel time distribution for lateral flow in a vertical cross section is represented by Meinardi (1994): where g age (yr) is the age of groundwater at a specific depth, and z (m) is the depth in the aquifer (i.e., z = 0 at the top of the aquifer and z = D at the bottom of the aquifer). Denitrification takes place during transport in the shallow system along the various flow paths in a homogeneous and isotropic aquifer, drained by parallel rivers or streams. IMAGE-GNM simulates the effects of denitrification in N concentrations at time t and depth z (C N (t, z)) through a firstorder degradation reaction, leading to an exponential decay Eq. for the nitrogen concentration: where t is time and the decay rate k is obtained via the halflife of nitrate (dt50 den ) due to denitrification: Lithology can have a direct effect on denitrification, and thus dt50 den (Dürr et al., 2005). Siliciclastic material exhibits low dt50 den values of 1 yr −1 , whereas alluvial material has dt50 den values of 2 yr −1 and all other lithology classes have a dt50 den value of 5 yr −1 ( Table 1). The N concentration in water percolating to deep groundwater represents the outflow from shallow groundwater. IMAGE-GMN assumes that denitrification is absent in deep groundwater. Although denitrification could occur in organic matter-and/or pyrite-rich deep aquifers, denitrification measurements in the literature have a bias toward high rates (Green et al., 2008), which makes their global assessment difficult. Following Beusen et al. (2013), nitrogen transported through submarine groundwater discharge (SGD) is excluded from the delivery to rivers and other water bodies. This assumption is justified, since, only 10 % of the gridded map could contribute to SGD. The remaining aquifer discharge in the grid box goes towards streams and rivers.
While urban areas can have an effect in the N loss to the environment (e.g., Foppen, 2002;Wakida and Lerner, 2005;Van den Brink et al., 2007;Nyenje et al., 2010), the total urbanized land represents 0.3 % of the total land area (Angel et al., 2005), and thus it is neglected from the model. The median NH 4 concentration in groundwater of 25 European aquifers is 0.15 mg L −1 (Shand and Edmunds, 2008), which represents a small part (0.7-1.2 %) of the nitrogen concentration (EEA, 2012), and thus NH 4 in groundwater is also neglected.

N transport and removal in riparian zones
Modeling geochemical processes in riparian zones require a detailed hydrological and geographical information at very high spatial scales, since, even at 0.1 km resolution, the topography of the riparian area cannot be adequately assessed (Vidon and Hill, 2006). IMAGE-GNM therefore uses a conceptual approach.
In riparian zones, denitrification rates depend highly on the local pH (Knowles, 1982;Simek and Cooper, 2002), temperature, water saturation, NO − 3 availability and soil organic carbon availability. Previous laboratory studies of pure cultures have shown that denitrification is maximized at a pH of 6.5 to 7.5, and decreases at both low (below 4) and high (above 10) pH values (Van Cleemput, 1998;Van den Heuvel et al., 2011).
As with soil denitrification, riparian zone denitrification is calculated using dimensionless reduction factors and is based on the characteristics of the groundwater flow, soil and climate. Heterotrophic denitrification is assumed to be highest at pH > 7 ( Van den Heuvel et al., 2010). A pH reduction factor f denpH,rip is then used to reduce the value with decreasing pH, such that f denpH,rip = 1 at pH > 7 and 0 at pH < 3 (Fig. 5).
where N in is the nitrogen that enters the riparian zone from the shallow groundwater.
where f climate is the product of f K (Eq. 13) and the water (and NO − 3 ) travel time through the riparian zone (T r,rip ). T r,rip depends on the thickness of the riparian zone (D rip ≤ 0.3 m, depending on the soil thickness), on the available water capacity for the top 1m of the riparian zone (tawc), and on the flow of water entering the riparian zone from the shallow groundwater (q int ) :

In-stream nutrient retention
Three processes contribute to N retention, i.e., denitrification, sedimentation and uptake by aquatic plants. Denitrification is generally the major component of N retention (Saunders and Kalff, 2001). P is removed by sedimentation and sorption by sediment (Reddy et al., 1999). Retention in a grid cell is calculated as a first-order approximation according to where R is the fraction of the nutrient load that is removed (-), v f is the net uptake velocity (m yr −1 ), E is the nutrient considered (N or P), and H L is the hydraulic load (m yr −1 ) obtained from where D is the depth of the water body (m), τ is the residence time (yr) and τ is calculated from the volume V (m 3 ) of the water body and the discharge Q (m 3 yr −1 ): for all water bodies except for river channels and floodplains where the discharge Q is reduced by the water volume in the floodplains (see Eq. 6). In this approach hydrological (defined by H L ) and biological and chemical factors (defined by v f ) controlling retention are isolated, assuming first-order kinetics is applicable (i.e., areal uptake changes linearly with concentration). Net uptake velocity is different for each element E (N or P). For N, the basic value for all water body types of 35 m yr −1 taken from (Wollheim et al., 2006(Wollheim et al., , 2008a is modified based on temperature and N concentration: where t is annual mean temperature ( • C) and C N is the N concentration in the water. f (C N ) describes the effect of concentration on denitrification as a result of electron donor limitation in the case of high N loads; the results of Mulholland et al. (2008) were mimicked by assuming a decrease of f (C N ) from a value of 7.2 at C N = 0.0001 mg L −1 to 1 for C N = 1 mg L −1 , a further decrease to 0.37 for C N = 100 mg L −1 and constant at higher concentrations.
The temperature effect f (t) is calculated as where α = 1.0717 for N (following Wollheim et al., 2008a and references therein) and α = 1.06 for P (following Marcé and Armengol, 2009). For P, the basic value for v f of 44.5 m yr −1 taken from Marcé and Armengol (2009) is used for all water body types, with a modification based on temperature: v f,P = 44.5 f (t) .
The drainage network of PCR-GLOBWB represents streams and rivers of Strahler order (Strahler, 1957) 6 and higher. The parameterization of lower-order streams follows the approach presented by Wollheim et al. (2008b). A globally uniform subgrid river network is included for all grid cells without lakes or reservoirs. It is assumed that PCR-GLOBWB has one river of order 6 in each grid cell, and all lower-order rivers are lacking. The river network is then defined on the basis of stream length and basin area of the first-order river. The mean length ratio R L (-) is used to calculate the stream length of the next higher order the river according to with L n being the stream length of order n (km); L 1 = 1.6 km. The drainage area ratio R a (-) is used to calculate the basin area for higher-order stream as follows: where A n is basin area of order n in km 2 ; A 1 = 2.6 km 2 . With the stream number ratio R b (-) the number of lower-order streams is calculated as with R n being the number of streams of order n in this grid cell; R b = 4.5. The discharge for each stream is calculated with the runoff (q): with the discharge of stream order n (Q n ) in m 3 s −1 , runoff in mm yr −1 and C Q the unit conversion (C Q = 1000/(3600 × 24 × 365)). The midpoint discharge of a stream length of order n is calculated as Q mid,n = Q n + 0.5Q n−1 .
The width of the stream of order n is calculated as where W n = width (m), A is a constant (A = 8.3 m) and coefficient B = 0.52. It is now possible to calculate the hydrologic load (H L ) and thus the retention of the stream according to with C Q1 being the conversion from seconds to years (C Q1 = 3600 × 24 × 365), C Q2 the conversion from km to m (1000) and H L in m yr −1 . The local diffuse load in a grid cell is spatially uniformly distributed over the streams. Here, the fraction of the total stream length per order is used to calculate the distribution of the load. The direct load is allocated to stream-order n as follows: where F d,n is the fraction of the total load, which is direct input for streams of order n. The pathway of the outflow of the streams is determined according to a matrix T i,j representing the fraction of the outflow of stream-order i to stream-order j , whereby T i,j = 0.0 for i ≥ j . For i < j , T i,j is calculated as follows: The calculation of the retention is performed for each stream order, starting with order n = 1, and is identical to the calculation of the PCR-GLOBWB schematization. The load of a stream is the sum of the direct load and the sum of the outflow from lower-order streams.

Data analysis
For the comparison of observations for individual monitoring stations or ad hoc measurements in rivers and simulated concentrations of river water, we use the root mean squared error (RMSE) expressed as a percentage. RMSE is calculated as follows: where O is the mean of the observations, O i is observation i, M i is the simulated value i and n is the number of data pairs. We consider values of 50 % acceptable in view of the global scale of the model. The sensitivity of the modeled delivery, retention and river export for the year 2000 to variation of 48 model parameters for N and 34 for P is based on parameter-specific distributions between a minimum and maximum value around the standard parameter values (Table 3). The sensitivity analysis was performed using the Latin hypercube sampling (LHS) technique (Saltelli et al., 2000). LHS is a multi-parameter,  Table 3. Model parameters included in the sensitivity analysis, their symbol and description, for which nutrient it is used, and the standard, minimum, mode and maximum value considered for the sampling procedure. Parameters are listed in alphabetical order of their symbol. Water volume of all water bodies N/P U1 1.0 0.9 1.1 * Samples values are applied to all grid cells. For sampling, either uniform of triangular distributions are used. A triangular distribution is a continuous probability distribution with lower limit a, upper limit b and mode c, where a ≤ c ≤ b. The probability to sample a point depends on the skewness of the triangle. In the case of dt50 den,dgrw , ac = bc, and probability to sample a point on the left and right hand side of c is the same. In other cases, for example, fqsro(crops) is a fraction (range = [0,1]), with standard value of 1.0. To achieve a high probability to sample close to 1.0, the triangle is designed with b = 1 and c is close to 1. For some of the above distributions the expected value is not equal to the standard. Since the calculated R 2 for all output parameters exceeds 0.99, this approach for analyzing the sensitivity is still valid. stratified sample method based on subdividing the range of each of the k parameters into disjunct equiprobable intervals or runs (Num). By sampling one value in each of the Num intervals according to the associated distribution in this interval, Num sampled values are obtained for each parameter. Num was 500 for P and 750 for N.
The sampled values for the first model parameter are randomly paired to the samples of the second parameter, and these pairs are subsequently randomly combined with the samples of the third source, and so forth. This results in an LHS consisting of Num combinations of k parameters. The parameter space is thus representatively sampled with a limited number of samples.
The uncertainty contributions of each input parameter (X i ) can be further assessed by combining LHS with linear regressions with respect to the model outputs (Y i ) via Saltelli et al. (2000Saltelli et al. ( , 2004: where β i is the so-called ordinary regression coefficient and e is the error of the approximation. The linear regression model can be evaluated using the coefficient of determination (R 2 ), which represents the Y variation as explained by Y −e. β i depends on the scale and dimension of X i , the sensitivity study can be normalized by rescaling the regression equation using of the standard deviations for Y and X (σ Y and σ Xi , respectively) and calculating the standardized regression coefficient (SRC i ): SRC i can take values in the interval [−1,1]. SRC is the relative change Y /σ Y of Y due to the relative change X i /σ X i of the parameter X i considered (both with respect to their standard deviation σ ). Hence, SRC i is independent of the units, scale and size of the parameters, and thus sensitivity analysis comes close to an uncertainty analysis. A positive SRC i value indicates that increasing a parameter value will cause an increase in the calculated model output, while a negative value indicates a decrease in the output considered caused by a parameter increase. The sum of squares of SRC i values of all parameters equals the coefficient of determination (R 2 ), which for a perfect fit equals 1. Hence, SRC 2 i /R 2 yields the contribution of parameter X i to Y . For example, a parameter X i with SRC i = 0.1 adds 0.01 or 1 % to Y in case R 2 equals 1.

Comparison with measurement data
We first compared the IMAGE-GNM model results with observed concentrations for two stations in the rivers Rhine and Meuse and at 11 stations in the Mississippi, USA (see Supplement). Stations near the river mouth (Lobith at the Rhine, Eysden at the Meuse, and St. Francisville, Louisiana, for the Mississippi) are shown first. The latter station was selected for comparison due to its widespread use in literature, for example by the US Geological Survey analysis of water quality (US Geological Survey, 2009). The measured concentrations were first aggregated to annual discharge-weighed concentrations, whereby for the US data years with < 6 observations were excluded. The model performance for the river Rhine for N concentrations (RMSE = 15 %) is better than for the Meuse and Mississippi (Fig. 6a, b, d, e, g, h). IMAGE-GNM overestimates N concentrations in the river Meuse (RMSE = 31 %) in almost all years; the model underestimates N concentrations in the early 1980s for the Mississippi, while its performance is better from the second half of the 1980s (RMSE for Mississippi = 23 %). P concentrations in the Mississippi are consistently underestimated (RMSE = 51 %) (Fig. 7a, b, d, e, g, h). P concentrations are overestimated in the Rhine in all years with data, although the declining trend is well captured (RMSE = 28 %). The modeled P concentrations are close to observations in the Meuse, with deviations in both directions (RMSE = 36 %).
The residues (observation minus simulation) for the observed vs. simulated concentrations of N and P (Figs. 6c and 7c) in the Mississippi show a very clear trend from overestimation at low concentrations to underestimation at high concentrations. The residues show a trend in the Rhine, with a slight increase along with increasing concentrations (Figs. 6f  and 7f). The Meuse also shows such trends, although less clear. For P the residue increases with increasing concentration, and for N the opposite occurs (Figs. 6i and 7i).
Since the deviations from observed concentrations can stem from errors in the hydrology, we compared the simulated vs. observed discharges (Fig. 8). Results for the Mississippi (Fig. 8a) show a good agreement but with overestimation in most years. While the RMSE is 19 % for the Mississippi, there is no consistent trend between residue and discharge, indicating no systematic error (Fig. 8b). The RMSE for the discharge of the Rhine is 14 %, with a consistent underestimation by the model (Fig. 8c), and the residues show a clear increase with observed discharge (Fig. 8d), indicating a systematic error in the model. For the Meuse, the RMSE for the discharge is 23 %, the discharge seems to be overestimated (Fig. 8e), and there is only a small trend between discharge and residue (Fig. 8f).
Overall, while discharge is overestimated in the Mississippi, N and P concentrations are underestimated in most years, indicating that part of the problem is in the hydrology. The hydrology model consistently underestimates discharge, while N concentrations are underestimated in most years, and P is underestimated in the first period up till about 1980, and after this year there is a slight overestimation. So apparently errors in the hydrology cannot explain those in the nutrient concentrations. The discharge of the Meuse is over- We also investigated the model performance for 10 more stations in various states within the Mississippi River basin (Table 4). These stations along with the St. Francisville station form the monitoring network for nine subbasins in the Mississippi (US Geological Survey, 2007). The plotted data for all 11 stations in Mississippi River basin are available as separate graphs in the Supplement. The model performance is acceptable (RMSE < 50 %) for eight stations for N concentrations and five stations for P concentrations. There are some stations where the model poorly simulates the N concentrations such as Arkansas River and Red River (Table 4). Such high RMSE values do not occur for P. In general, simulated P concentrations are closer to observed values than N concentrations.
One of the reasons for poor agreement is the large fluctuation of discharge, load and concentration at some stations. Apparently, these peaks are associated with periods of high rainfall. We do not know if these peak values represent the full period of the measurement interval. For example, a peak value that represents 2 months (in the case there are six measurements per year) also yields a peak in the aggregated annual value. However, it is not known if this peak actually represents 1 day (with a much lower aggregated annual value) or 2 months. In contrast to St. Francisville, P concentrations (and N concentrations) at the other stations are not consistently underestimated or overestimated. Furthermore, at this level of comparison, the spatial data for land use and wastewater discharge locations in urban areas may not be realistic. For example, our wastewater discharge occurs in all grid cells with urban population, while in reality discharge takes place in discrete locations such as wastewater treatment plants.
A further performance test involves a direct comparison between aggregated data and model results for a large number of European rivers (see Supplement) (European Environment Agency, 2013). This data set includes monitoring data at different stations for 125 rivers, 49 for N and 76 for P. River basins with less than four grid cells, of ∼ 2500 km 2 each, were removed because river basin areas of < 10 000 km 2 do not have adequate spatial data representation. This is an arbitrary choice, and probably many river basins with 4-10 grid cells also suffer the problem of poor spatial data. Measurements for some stations were removed from the data set as outliers (Table S1). Results for all measurements show a coefficient of determination of 0.59 and RMSE of 124 % for N (n = 709) and 0.58 and RMSE of 184 % for P (n = 1010) ( Fig. 9a and b). Results show reasonable coefficients of determination (r 2 ) of 0.79 and RMSE of 112 % for P and 0.55 and RMSE of 95 % for N ( Fig. 9c  and d). The average of all measurements for N and P is slightly lower than the simulated concentrations (0.16 vs. 0.25 mg P L −1 and 1.25 vs. 1.78 mg N L −1 ). The mean of observations and model values over the monitoring period for each station showed good agreement ( Fig. 9e and f). There is also good agreement between model and data for the mean for all stations for each year with deviations never exceeding 1 mg N L −1 and 0.2 mg P L −1 (Fig. 9e and f). It is clear that the model has problems when modeling individual stations in small rivers in the database. The plotted data for all stations in the European rivers (available as separate graphs in the Supplement) show that the model results for the Danube, for example, are in good agreement with observations for two stations. Most simulated concentrations are within a factor of 2 of the observed concentrations in the EEA database (EEA, 2012). Our model results also show a fair agreement with the validation data set for the early 1990s for total N collected by Van Drecht et al. (2003) (Fig. 10). Modeled total N concentrations for the Amazon for the early to mid-1980s (0.7-0.9 mg L −1 ) are close to measured values (0.4-0.5), and results for total P (0.07 mg L −1 ) are also close to observations (0.06 mg L −1 ) (Forsberg et al., 1988;Meybeck and Ragu, 1995).
These comparisons of our model output with data at various aggregation levels show that IMAGE-GNM based on three calibrated submodels (hydrology, nutrient input and instream removal) performs very well without any tuning of the overall, integrated model. We have deliberately chosen to not further tune the model so that we can identify its shortcomings. Further improvement of model performance requires a sensitivity analysis.

Model sensitivity
The influence of a range of parameters on model sensitivity was investigated for modeled N and P delivery, retention and river export. Here we discuss only those parameters that are significant and have an SRC value > 0.2 or < −0.2 (parameters that add > 4 % to the delivery, retention or river export). Results presented in Tables 5 and 6 show that the sensitivity of N delivery, retention and river export for the year 2000 differs from that of P in many aspects. Total runoff (q tot ; Eq. 5) is significant for retention and river export of both N and P; runoff largely determines all transport pathways and flows of N (runoff, leaching, groundwater flow and also in-stream retention), and it determines P runoff, the major transport pathway for P. The soil N budget in natural ecosystems and arable land (N budget,crops ; N budget,nat ; Eq. 1) are important factors for the N delivery, but not for the retention and river export. For P the soil budgets are less important, because soil P content (P soil ) and bulk density (B soil ) govern the runoff of P more than the budget; actually, soil P content is actually a result of the long-term soil P budget.
Our model results suggest that allochthonous organic matter input to stream is an important but uncertain nutrient source. The factors determining the allochthonous organic matter input of N to streams and rivers (flooded area, A flooding ; litterfall, AOMI; its reduction factor for litterfall, F AOMI ; and its C : N ratio) are similarly important for the delivery and river export of N. For P both the parameters determining allochthonous inputs and weathering (C PWeath ; Eq. 7) are not significant nor important, as the biomass from litterfall contains only small amounts of P and because the anthropogenic sources are dominant.
For the modeling of river retention, the sensitivity analysis for a range of parameters shows that net uptake velocity (V f,river,N ; V f,river,P ; Eqs. 23, 26, 28) and mean length ratio (R L ; Eq. 29) are important for retention and river export for both N and P, and logically not for nutrient delivery. The assumption that N retention depends on N concentrations (N conc,low ; Eq. 26) is significant in all years for the retention and river export. Temperature (Temp; Eq. 27) is important for retention of P, and for retention and river export of N.
Results of the sensitivity analysis differ from previous studies (e.g., Bouwman et al., 2013a), primarily because the current model includes additional sources (allochtonous inputs) and changes in the model for surface runoff and leaching.

Future improvements
On the basis of the comparison with measurements and the model sensitivity, we can now analyze what parts of the model need improvement. Improvements are possible in both data and model components. Many components and data are ignored in this discussion, including all the data stemming from the IMAGE on soils, lithology, land use, vegetation distribution, nutrient cycles in agriculture and natural ecosystems and climate. We recognize that updates of the data used in this paper are now available. For example, soil data (http://www.isric.org/content/soilgrids), hydrographic information (http://hydrosheds.cr.usgs.gov/index.php) and lithology (Hartmann and Moosdorf, 2012) and associated porosity and permeability data (Gleeson et al., 2014). These updates have a finer resolution, allowing more specific calculation of surface characteristics (bare rock, more detailed soil texture classes, etc.). Hence, these updates and additional data sets will be considered for future improved versions of the model, and tested with new sensitivity analyses. It is difficult to know from the available analyses what could be done to improve the model, because error may be the result of uncertainties in the input data (land use, climate, hydrology, wastewater flows, etc.), in surface and subsurface processes or in-stream processes. However, two parts of the model have a dominant importance for the model results, i.e., total runoff from the water balance model PCR-GLOBWB and the factors determining the in-stream biogeochemistry including the uptake velocity and factors used in the parameterization of sub-grid processes for streams and  rivers of Strahler orders of 6 and less. Here we do not touch upon improvements of the hydrology model and focus on the nutrient-related processes, but see a clear need for improvement of the way the water flow in lakes and reservoirs is simulated, i.e., only the water that actually enters and leaves the lake is considered, with no role for the total water mass. Also, there is a need to improve the geohydrological information in order to better describe global aquifers, their thickness and their denitrification potential.
To improve the in-stream process description, the first short-term improvement is to add processes in sediments to allow for simulating P saturation of sediments and desorption in case of decreasing river P loads. The current model version uses air temperature as a proxy for water temperature. A clear improvement would be to use water temperatures in the spiraling approach, since there may be large differences, especially in low-order streams. Other examples are large rivers influenced by cooling water from nuclear or other power plants. The river Meuse is such an example, and the overestimation of N concentrations may be caused by underestimation of the water temperature.
The importance of factors such as the P content of the soil call for attention to the description of the processes determining P (and N) transport to surface water via surface runoff. Our approach distinguishes an instant transport route, and the transport of soil material with the memory simulated by changing P content of the soil. The delay of the transport may be an important aspect to consider, but at present we have no data available to do so.
Longer-term improvements center on the incorporation of a mechanistic model for describing biogeochemical processes in the water column and sediment. This allows for further analysis of individual processes and their interplay (plant uptake, sedimentation, benthic processes, denitrification). This will involve a change to a temporal resolution that matches the requirements of the description of the biogeochemical processes (day, week, month). Mechanistic modeling of in-stream processes with shorter time steps requires a further refinement of the processes on the land, i.e., the temporal distribution of fertilizer application, manure spreading and grazing. This will also allow us to analyze the delay between rainfall events causing runoff and the discharge to the surface water. Also, such mechanistic models require a delivery and in-stream model that distinguishes different nutrient forms.
Mechanistic modeling also allows for the coupling of the processes of C with the nutrients N, P and Si, which may lead to better understanding of the C and nutrient fluxes to and from river basins. Regarding spatial scale, the current 0.5 • × 0.5 • resolution is large enough to assume that there are no interactions between grid cells. Future models at finer resolutions need to consider the fact that transport and processes may cross boundaries of grid cells.

Conclusions
The performance of our global nutrient model is similar to that of the more commonly used empirical approaches. The comparisons of our model output with data at various aggregation levels show that our model based on three submodels (hydrology, nutrient delivery and in-stream retention) performs very well without any calibration. We have deliberately chosen to not further tune the model so that we can identify its shortcomings.
IMAGE-GNM can simulate the present-day river nutrient export at the basin and global scales with acceptable deviations from observed values for large rivers, and gen-erally within a factor of 2 for small European rivers. The model can also be used to explore changes in various processes and interactions between them during the 20th century. More specifically, the IMAGE-GNM model allows for attributing changes in nutrient transport, retention and export to changes in hydrology and nutrient delivery or their interactions . It will therefore be a very valuable research tools to examine the effect of hydrological measures or climate-induced changes on nutrient processing and export and therefore on the functioning of downstream ecosystems.
Moreover, GNM is fully integrated into the integrated assessment model IMAGE and can thus provide nutrient transport and processing estimates fully consistent with scenarios based on, for example, the story lines of the shared socioeconomic pathways currently in use by the global climate change community (Kriegler et al., 2014).
An interesting application of IMAGE-GNM is to study the impacts of increasing river export, i.e., eutrophication of coastal marine ecosystems leading to phenomena such as increased production and hypoxia. The changing nutrient stoichiometry in freshwater and coastal systems may lead to phenomena such as harmful algal blooms. Such analyses require coupling our model to coastal biogeochemistry models.