Introduction
The transport and dispersion of gaseous and particulate pollutants are often
simulated to generate pollution forecasts for emergency responses or produce
comprehensive analyses of the past for better understanding of the particular
events. Lagrangian particle dispersion models are particularly suited to
provide plume products associated with emergency response scenarios. While
accurate air pollutant source terms are crucial for the quantitative
predictions, they are rarely provided in most applications and have to be
approximated with a lot of assumptions. For instance, the smoke forecasts
over the continental US operated by the National Oceanic and Atmospheric
Administration (NOAA) using the Hybrid Single-Particle Lagrangian Integrated Trajectory
(HYSPLIT) model in
support of the National Air Quality Forecast Capability (NAQFC) rely on
outdated fuel loading data and a series of assumptions related to smoke
release heights and strength approximation .
Observed concentration, deposition, or other functions of the atmospheric
pollutants such as aerosol optical thickness measured by satellite
instruments can be used to estimate some combination of source location,
strength, and temporal evolution using various source term estimation (STE)
methods . Among the applications, the recent Fukushima
Daiichi nuclear power plant accidents saw the most implementations of the
STE methods to estimate the radionuclide releases. The STE methods range from
simple comparisons between model outputs and measurements
e.g., to
sophisticated ones using various dispersion models and inverse modeling
schemes e.g.,. Another active
field for STE applications is the estimation of the volcanic ash emissions.
Many attempts have been made for several major volcano eruptions
.
While there are many STE methods applied to reconstruct the emission terms,
it is still a state of the art. Two popular advanced inverse modeling approaches
are cost-function-based optimization methods and those based on Bayesian
inference. However, it is difficult to evaluate the STE without knowing the
actual sources for most applications. generated pseudo-observations using the same dispersion model in their initial inverse
experiment tests, which are often called “twin experiments”. Such tests
allow observational errors to be added realistically e.g.,,
but it is non-trivial to represent the model errors incurred by other model
parameters such as the uncertainties of the meteorological field. One way to
objectively evaluate the inverse modeling results is to compare the
predictions with the independent observations or withheld data. However, such
indirect comparisons still cannot provide quantitative error statistics for
the source terms.
There have been some tracer experiments conducted to study the atmospheric
transport and dispersion with controlled releases. In these experiments, the
source terms were well-quantified and comprehensive measurements were made
subsequently over an extended area e.g.,. With the
known source terms, they provide a unique opportunity to evaluate the STE
methods. and used measurements from recent
dispersion experiment (Fusion Field Trials 2007) data to evaluate a
least-squares technique for identification of a point release. The European
Tracer Experiment (ETEX) data set was also used to study the STE methods
based on the principle of maximum entropy and a least-squares cost function
. However, such formal evaluation of the STE
methods is still very limited.
A HYSPLIT inverse system based on 4D-Var data assimilation and a transfer
coefficient matrix (TCM) was developed and applied to estimate a cesium-137
source from the Fukushima nuclear accident using air concentration
measurements . The system was further developed to estimate the
effective volcanic ash release rates as a function of time and height by
assimilating satellite mass loadings and ash cloud top heights
. In this study, the Cross-Appalachian Tracer Experiment
(CAPTEX) data are used to evaluate the HYSPLIT inverse modeling system. The
paper is organized as follows. Section describes the CAPTEX
experiment, HYSPLIT-4 model configuration, and the source term inversion
method. Section presents emission inversion results, and a
summary is given in Sect. .
Results
Recovering emission strength without model uncertainty
As an initial test, the exact release location and time are both assumed
known and the only unknown variable left to be determined is the release
rate or the total release amount. For this type of one-dimensional problem,
an optimal emission strength can be easily found without having to use
sophisticated minimization routines. For instance, the F may be
directly calculated for a number of emission strength values, and the
resulting F=F(q) plot will reveal the optimal q
strength that is associated with the minimal F. Note that such an
optimal solution not only depends on the chosen parameters in
Eq. but also highly depends on the HYSPLIT model setup and
the meteorological fields.
Both and showed that the HYSPLIT dispersion
model performed better for release 2 than the other releases. Thus, release 2
is initially chosen to perform a series of inverse modeling tests.
Assuming no prior knowledge of the emission strength, the first guess is
given as qb=0, and σ=104 kg h-1 is assumed. Sensitivity
tests show that when qb is changed to 100 kg h-1, the emission
strength estimates are nearly unchanged with the same or larger σ.
Firstly, no model uncertainties are considered to contribute to ϵ.
The observational uncertainties are formulated to include a fractional
component fo×co and an additive part ao. Note that this general
formulation is chosen for its simplicity. It should be replaced when more
uncertainty information is available. Table lists the
emission strength q that generates the minimal cost function for a series
of fo and ao combinations, where fo ranges from 10 % to 50 %,
and ao is assigned as 10, 20, and 50 pg m-3. All the emission
strength values obtained are significantly lower than the actual release of
67 kg h-1. It shows that a larger fo value tends to have a smaller
q estimate, but a larger ao results in a larger q. The significant
underestimation of the release strength is caused by the implicit assumption
of a perfect model when ϵ does not include the model uncertainties.
Figure shows the comparison between the predicted and
measured concentrations when the actual release rate of 67 kg h-1 is
applied. Large discrepancies still exist even when the exact release is known
and used in the simulation. For the measured zero concentrations, most of the
predicted values are non-zero and can be above 1000 pg m-3. As
ϵm=ao for these zero concentrations,
(cmh-cmo)2ϵm2 will dominate the cost function when ao is not large enough.
This explains that the underestimation is not as severe for ao=50 pg m-3 as that for ao=10 pg m-3. While ϵ do
not change with fo for the zero concentrations, smaller fo values help
increase the weighting of the terms (cmh-cmo)2ϵm2
associated with large measured concentrations. So, the estimated emission
strength when fo=10 % is better than when fo=50 %.
Comparison between the predicted and measured concentrations for
release 2 during the CAPTEX experiment. In the HYSPLIT simulation, at the
exact release location, an emission rate of 67 kg h-1 was applied from
17:00 to 20:00 Z on 25 September 1983. A constant 1 pg m-3 is
added to both predicted and measured concentrations to allow logarithm
calculation.
As stated in , the metric variable in Eq. () can be
changed to ln(c), i.e., replacing (cmh-cmo) with
ln(cmh)-ln(cmo). A constant 0.001 pg m-3 is added to both
cmh and cmo to allow the logarithm operation for zero concentrations.
In such a case, ϵmln(c) can be calculated as
ϵmln(c)=ln1+fo+aocmo.
Note that 0.001 pg m-3 is also added to cmo in the second term
to avoid dividing by zero. The aocmo term in
Eq. () makes ϵmln(c) larger for measured low
concentrations than those measured high concentrations. It causes more
weighting towards measured high concentrations and results in overestimation
shown in Table . The measured zero concentrations have
little effect on the final emission strength estimates.
Table shows that the emission strengths are
overestimated but are within a factor of 2 over the actual release of
67 kg h-1, for all fo and ao combinations. The similar trends of
how q changes with fo and ao are also observed here; i.e., a larger
ao or a smaller fo tends to have a larger q estimate.
Emission strength of release 2 that minimizes F for
different observational errors, defined as ϵ=fo×co+ao.
Concentration is used as the metric variable.
Emission (kg h-1)
ao=10 pg m-3
ao=20 pg m-3
ao=50 pg m-3
fo=10 %
7.1
11.1
17.4
fo=20 %
4.1
7.1
12.6
fo=30 %
2.9
5.2
10.0
fo=50 %
1.8
3.4
7.1
Emission strength of release 2 that minimizes F for
different observational errors, defined as ϵ=fo×co+ao.
Logarithm concentration is chosen as the metric variable; i.e.,
(cmh-cmo) in Eq. () is replaced with ln(cmh)-ln(cmo).
Emission (kg h-1)
ao=10 pg m-3
ao=20 pg m-3
ao=50 pg m-3
fo=10 %
115.2
119.8
124.7
fo=20 %
106.3
112.9
119.8
fo=30 %
101.2
108.5
116.3
fo=50 %
94.4
101.2
109.6
While using logarithm concentration as the metric variable yields better
emission estimates than using concentration as the metric variable, the
results in Table are apparently systematically
overestimated compared to the systematically underestimated results in
Table . In addition, the fo and ao combinations
associated with the best emission estimates in Tables and
appear to be in the opposite corners of the tables.
Recovering emission strength with model uncertainty
To consider the model uncertainties in a simplified way, ϵ2 will be formulated as
ϵm2=fo×cmo+ao2+fh×cmh+ah2.
As ao and ah affect the ϵ2 in a similar way, the
representative errors caused by comparing the measurements with the predicted
concentrations averaged in a grid can be included in either ah or ao.
With logarithm concentration as the metric variable,
(ϵmln(c))2 is comprised of two parts, as
ϵmln(c)2=ln1+fo+aocmo2+ln1+fh+ahcmh2.
Note that a constant small number (0.001 pg m-3) is added to
denominators cmo and cmh to avoid dividing by zero.
Since the predicted concentrations cmh in Eqs. () and
() will vary when source term estimates change, the model
uncertainties will depend on the current release parameters. Thus, the model
uncertainty terms are not static during the inverse modeling and they change
along with the source estimates. Using concentration and logarithm
concentration as the metric variable, respectively,
Tables and show the emission
strength estimates with different fh and ah, while keeping fo=20 %, ao=20 pg m-3. Additional tests with other chosen fo
and ao values show similar but slightly different results. For brevity,
they are not presented here. It should be noted that the model uncertainties
are not equivalent to model errors. Although dispersion model simulations can
have large errors due to various reasons including the source term
uncertainties, the model uncertainties are used to indicate that the model is
not perfect even with the “optimal” model parameters. Similar to the weak
constraint applied in operational 4D-Var data assimilation systems
, introducing model uncertainties is mainly intended to
relax the model constraint for imperfect models. Here, the fh and ah
parameters are given similar ranges to those given to the observational
uncertainty parameters.
When concentration is used as the metric variable, the emission strength
estimates with model uncertainties considered are improved over those without
model uncertainties. The estimates of emission strength generally increase
with the model uncertainty, either through ah or fh, except for
fh=50 %, when the q estimates slowly decreases with ah. When fh=0 %, ah=10, 20, and 50 pg m-3, while ao=20 pg m-3;
the q estimates, 7.7, 9.1, and 13.6 kg h-1, are in line with the
results shown in Table , where q=7.1 kg h-1 for
ao=20 pg m-3 and q=12.6 kg h-1 for ao=50 pg m-3.
However, the trend of how q estimates change with fh is opposite to how
q estimates change with fo. Table shows that the
emission strength increases with the model uncertainty factor fh. With
fh=20 %, the release estimates of 48.5, 50.4, and 53.5 kg h-1
are all within 30 % of the actual release rate of 67 kg h-1.
Instead of the underestimation shown in Table , the release
estimates are overestimated when fh=50 % is assumed.
With logarithm concentration as the metric variable, larger ah or fh
results in slightly smaller q estimates. While how q estimates change
with fh is similar to how they change with fa in
Table , how q estimates change with ah is opposite
to how q estimates change with ao before introducing model
uncertainties. Equation () shows that fo and fh
affect (ϵmln(c))2 in a simple monotonic way, while the effect
of amh is complicated, as it is divided by the cmh value that varies
with the source terms. Table shows that the source
terms are no longer overestimated as those in Table .
In fact, all cases have slight to moderate underestimation, with the worst
results being q=42.6 kg h-1 when fh=50 % and ah=50 pg m-3. Another aspect of using logarithm concentration as the
metric variable is that the range of the release estimates listed in
Table is not as large as that in
Table , which resulted from using concentration as the metric
variable for the same 12 combinations of ah and fh.
Emission strength of release 2 that minimizes F for
different fh and ah. Concentration is taken as the metric variable.
ϵ2=(fo×co+ao)2+(fh×ch+ah)2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
7.7
9.1
13.6
fh=10 %
15.9
22.1
32.9
fh=20 %
48.5
50.4
53.5
fh=50 %
114.0
111.8
104.3
Emission strength of release 2 that minimizes F for
different fh and ah. Logarithm concentration is taken as the metric
variable. (ϵmln(c))2=[ln(1+fo+aocmo)]2+[ln(1+fh+ahcmh)]2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
64.7
58.5
53.5
fh=10 %
61.5
55.7
49.4
fh=20 %
58.5
53.0
46.6
fh=50 %
55.1
49.4
42.6
Cost function normalization
Without model uncertainties, the weighting terms for each model–observation
pair do not change with emission estimates. When ϵm2 and
(ϵmln(c))2 are formulated as in Eqs. () and
(), respectively, they vary with emission estimates. This
may cause complication in some circumstances when logarithm concentration is
used as the metric variable. To avoid having zero source as a global
minimizer in such situations, the sum of the weights of the mismatch between
model simulation and observations is kept unchanged for varying qij by
normalizing it with the weight sum when qij=qijb, as shown in
Eq. ().
F=12∑i=1M∑j=1N(qij-qijb)2σij2+12∑m=1M(cmh-cmo)2ϵm2×∑m=1M1ϵmb2∑m=1M1ϵm2
Figure shows the cost function as a function of source
strength when (ϵmln(c))2 is defined as in
Eq. (), with fh=0, ah=50 pg m-3,
fo=10 %, ao=20 pg m-3. Before introducing cost function
normalization, a global minimal cost function appears when release strength
approaches zero, while a local minimal cost function exists at
56.8 kg h-1. Several such instances were found when
ah=50 pg m-3 and when fh is 0 or 10 %, while both fo and
ao are relatively small. The smaller cost function when release strength
approaches zero is due to the increasing (ϵmln(c))2 in
Eq. () as cmh gets smaller. While the
model–observation differences are not smaller for lower release strength, the
drastic increase of (ϵmln(c))2 when ah=50 pg m-3 and
fh is 0 % or 10 % results in smaller cost function with decreasing
source strength.
Cost function as a function of source strength when
(ϵmln(c))2 is defined as in Eq. () before
and after cost function normalization, with fh=0, ah=50 pg m-3,
fo=10 %, and ao=20 pg m-3.
Figure shows that the cost function has the minimum at
q=67.3 kg h-1 after normalization. Note that the dramatic difference
of the cost function magnitude before and after the normalization is due to
the extremely small value of ∑m=11ϵmb2 calculated
at qb=0. Tables and show
the emission strength estimates after cost function normalization with
different fh and ah, while keeping fo=20 %, ao=20 pg m-3, using concentration and logarithm concentration as the
metric variables, respectively. Note that fo=20 % was chosen for the
cases listed in Table , while fo=10 % was
chosen in Fig. to illustrate the potential problem.
How estimates change with fh and ah in Tables and is similar to
what is shown in Tables and . The estimates
are generally closer to the actual release than those obtained without the cost function normalization.
When having concentration as the metric variable and with fh=50 %, the
emission strength estimates are 64.7, 64.7, and 65.3 kg h-1 for
ah=10, 20, and 50 pg m-3, respectively. They are all within 5 % of
the actual release rate. However, fh less than or equal to 20 %
results in significant underestimation. When having logarithm concentration
as the metric variable, the source term estimates are not very sensitive to
fh and ah values, and the results listed in
Table are all within 20 % of the actual
release rate. Among those estimates, a result of 67.3 kg h-1 when
fh=10 % and ah=10 pg m-3 is almost identical to the actual
release rate.
Emission strength of release 2 that minimizes normalized
F defined in Eq. () for different fh and ah.
Concentration is taken as the metric variable. ϵ2=(fo×co+ao)2+(fh×ch+ah)2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
7.7
9.1
13.6
fh=10 %
10.9
15.1
26.4
fh=20 %
32.9
35.6
41.3
fh=50 %
64.7
64.7
65.3
Emission strength of release 2 that minimizes normalized
F defined in Eq. () for different fh and ah.
Logarithm concentration is taken as the metric variable.
(ϵmln(c))2=[ln(1+fo+aocmo)]2+[ln(1+fh+ahcmh)]2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
69.3
64.0
62.1
fh=10 %
67.3
63.4
60.9
fh=20 %
65.3
61.5
59.1
fh=50 %
61.5
58.0
55.1
Ensemble
simulated CAPTEX releases using a variety of PBL schemes. In their configuration, WRF version 3.5.1 was
used with 27 km grid spacing and 33 vertical layers. The NARR data set was
used for the initial conditions and lateral boundary conditions. The WRF
model was initialized every day at 06:00 UTC, and the first 18 h of spin-up
time in the 42 h simulation were discarded. The PBL schemes used to create
the WRF ensemble were the Yonsei University YSU,
Mellor–Yamada–Janjic MYJ, quasi-normal scale elimination
QNSE, MYNN 2.5 level TKE MYNN, ACM2
ACM2, Bougeault and Lacarrere BouLac,
University of Washington UW, total energy mass flux
TEMF, and Grenier–Bretherton–MaCaa GBM
schemes. Nine simulations were conducted with the PBL schemes and their
associated surface layer schemes, except for the YSU, BouLac, UW, and GBM
cases in which the MM5 Monin–Obukhov surface scheme was applied. The
land-surface model was Noah land-surface model , except ACM2
in which the Pleim–Xiu land-surface model was used.
An individual TCM is generated using each of the nine simulations. The nine TCMs can be used to estimate
the emission strengths independently following the same procedure described previously.
Tables and show the
third (25th percentile), fifth (median), and seventh (75th percentile) emission strengths of
the nine estimates that minimize the normalized F defined in
Eq. () with different fh and ah, while keeping fo=20 %, ao=20 pg m-3, using concentration and logarithm
concentration as the metric variables, respectively. The 25th percentile and
75th percentile values are mostly within 5 % of the median estimates.
While the median estimates show the same trends with fh and ah as the
results in Tables and , they
are significantly larger due to the meteorological model differences.
Apparently, the differences among the simulations with different PBL schemes
are smaller than the differences between the ensemble simulations here and
the simulation used in the earlier sections. This suggests that uncertainties
of the emission strength are probably larger than the ranges indicated by the
25th and 75th percentile values. The results using logarithm concentration as
the metric variable are quite robust with the listed model uncertainty
parameters. However, the estimates using concentration as the metric variable
are very sensitive to fh and ah. This is consistent with results shown
in Sect. and .
Instead of using each individual TCM generated from nine simulations
independently, the nine TCMs can be combined into one matrix by taking the
median or average values. The combined TCM can then be used to estimate the
source terms. The results for concentration and logarithm concentration
metric variables are listed in Tables and
, respectively. They show that the emission
estimates using the median transfer coefficients of the nine TCMs are very
close to the median of the nine estimates using the nine simulations
individually. For the cases with logarithm concentration as the metric
variable, the emission estimates using the median value of the nine TCMs are
all within 3.1 % of the median values of the nine estimates obtained with
each individual TCM. For the cases with concentration as the metric variable,
the average relative differences are 6.4 %, with the maximum relative
difference being 10.8 % when fh=10 % and ah=50 pg m-3.
Combining the TCMs by taking the median value generates slightly better
results than combining the TCMs by taking the average value does.
Similar to what was found in earlier sections and also in ,
the cases having logarithm concentration as the metric variable
generally yield better results than those having concentration as the metric variable.
It is probably due to the large range of the concentrations.
When having concentration as the metric variable, certain model uncertainty parameters
yield good source terms, but the estimates are quite sensitive to
the choices of the model uncertainty parameters.
However, it is not easy to find such model uncertainty parameters that would yield satisfactory
results for applications when the actual releases are indeed unknown.
The results here and in the previous sections show that the
estimates having logarithm concentration
as the metric variable are quite robust for a reasonable range of model uncertainty parameters.
For these reasons, logarithm concentration is chosen as the metric variable for the later tests.
The third (25th percentile), fifth (median), and seventh (75th percentile)
emission strengths of nine simulations of release 2 that minimize the
normalized F defined in Eq. () for different fh and
ah. Concentration is taken as the metric variable. ϵ2=(fo×co+ao)2+(fh×ch+ah)2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
6.0, 7.0, 7.2
7.4, 8.8, 8.8
13.4, 15.1, 15.3
fh=10 %
20.0, 21.0, 21.9
23.9, 26.1, 27.2
33.2, 35.2, 37.4
fh=20 %
48.5, 49.9, 59.1
53.0, 54.6, 62.8
58.5, 62.8, 68.6
fh=50 %
191, 205, 274
186, 197, 258
158, 168, 207
The third (25th percentile), fifth (median), and seventh (75th percentile)
emission strengths of nine simulations of release 2 that minimize normalized
F defined in Eq. () for different fh and ah.
Logarithm concentration is taken as the metric variable.
(ϵmln(c))2=[ln(1+fo+aocmo)]2+[ln(1+fh+ahcmh)]2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
102, 106, 113
93.4, 100, 105
83.8, 88.9, 97.2
fh=10 %
97.2, 102, 108
88.9, 96.3, 101
80.5, 85.4, 94.4
fh=20 %
93.4, 98.2, 105
86.3, 92.5, 98.2
78.1, 82.9, 91.6
fh=50 %
88.9, 93.4, 101
82.9, 88.0, 94.4
75.8, 81.3, 87.2
Emission strength estimates by using the average and median value of
nine simulations for release 2. The cost function is normalized F
as in Eq. (). Concentration is taken as the metric variable.
ϵ2=(fo×co+ao)2+(fh×ch+ah)2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
7.2, 7.5
8.9, 9.1
15.6, 15.9
fh=10 %
22.3, 23.4
22.2, 28.0
37.0, 37.0
fh=20 %
55.1, 53.0
59.7, 58.0
66.6, 64.7
fh=50 %
213, 227
205, 213
178, 177
Emission strength estimates by using the average and median value of
nine simulations for release 2. The cost function is normalized F
as in Eq. (). Logarithm concentration is taken as the metric
variable. (ϵmln(c))2=[ln(1+fo+aocmo)]2+[ln(1+fh+ahcmh)]2. fo=20 %, ao=20 pg m-3.
Emission (kg h-1)
ah=10 pg m-3
ah=20 pg m-3
ah=50 pg m-3
fh=0
115, 108
105, 100
95.3, 90.7
fh=10 %
110, 103
100, 95.3
91.6, 87.2
fh=20 %
105, 100
97.2, 92.5
88.9, 85.4
fh=50 %
100, 96.3
93.4, 88.9
86.3, 82.1
Source location and other releases
In addition to the source strength, the source location and its temporal
variation can be retrieved with adequate accuracy using the HYSPLIT inverse
system described here if there are sufficient measurements available. For
instance, estimated 99 6 h emission rates of the radionuclide
cesium-137 from the Fukushima nuclear accident using 1296 daily average air
concentration measurements at 115 stations around the globe. Here, the
system's capability to locate a single source location will be tested using a
straightforward approach. In these tests, the release time is assumed known,
but its location and strength are left to be determined. A region of suspect
is first gridded at certain spatial resolution to form a limited number of
candidate source locations. An optimal strength is then found at each
candidate source location following the method described earlier. The
location that results in the best match between the predicted and the
observed concentrations is considered as the likely source location.
In the following tests, a 11×11 grid with 0.2∘ resolution in
both longitude and latitude directions is used to generate 121 candidate
source locations. They are centered at 40.0∘ N, 84.5∘ W,
for releases 1–4, and centered at 46.6∘ N, 80.8∘ W, for
releases 5 and 7. Using the normalized F defined in
Eq. () and assuming fo=20 %, ao=20 pg m-3, fh=20 %, and ah=20 pg m-3, a minimal cost function associated
with an optimal release strength can be found at each location. When
logarithm concentration is taken as the metric variable, the emission
estimates are not sensitive to fh and ah choices, as indicated by the
results in Tables , ,
and . Figure shows the 121
candidate locations and their respective minimal cost function values for
release 2. No candidate locations are chosen to collocate with the actual
source location which will be unknown for the future applications that need
to locate the sources. A global minimal point is found at 39.8∘ N,
84.5∘ W, with
Fmin=3.14 achieved when q=48.5 kg h-1. This grid
point is taken as the estimated source location and it is 26.4 km away from
the actual release site (39.90∘ N, 84.22∘ W). The
neighboring location (39.8∘ N, 84.3∘ W) which is the
closest to the actual release site yields a slightly larger
F=3.17 with an optimal release rate of 60.9 kg h-1. If
the exact source location is known as in the tests presented earlier, the
cost function F reaches 1.59 at its minimal point when
q=61.5 kg h-1. Apparently, compared with those cases when the
release strength is the only unknown, finding both the source location and
its strength with the same amount of observations is expected to be more
difficult. Note that the smaller normalized F values in
Fig. are for a case with different observation and model
uncertainty parameters, where fo=10 %, ao=20 pg m-3, fh=0 %, and ah=50 pg m-3.
Table lists the source location and strength
estimations for the six releases following the same procedure as described
here, where the uncertainty parameters are fo=20 %, ao=20 pg m-3, fh=20 %, and ah=20 pg m-3. Releases 1
and 4 have the minimal cost function Fmin occurring at the north
boundary and the west boundary, respectively. In such scenarios, it might be
necessary to expand the suspected source region for the future applications
to find the source locations. However, if source locations are known to
reside in the suspected region, the sources can definitely be near the
boundaries. In such cases, the point with Fmin should be
considered as the estimated source location. Releases 3, 5, and 7 have their
Fmin occurring at inner grid points, similar to release 2
shown in Fig. . None of the closest candidate source locations
yield the best match between model simulation and observations quantified by
the cost function F. Among the six releases, the estimated source
location for release 2 is the closest to its actual release site, with a
distance of 26.4 km.
The release rates obtained along with the likely source locations are
underestimated by a factor of 3 for release 1, and overestimated by a factor
of 3 for releases 4 and 7, while the estimates for releases 2, 3, and 5 are
much better, with relative errors of -27.6 %, -5.4 %, and 21.5 %,
respectively. Table also lists the release rates
q′ estimated with the exact source location assumed known. These estimates
for all releases are within a factor of 2 of the actual release
rates, and the largest relative error is 53.3 % for release 1. The
posterior uncertainties of the release rate estimates ϵq′ are
also calculated and listed. They range from 1.8 kg h-1 for release 2
to 6.2 kg h-1 for release 1. The apparent underestimation is likely
due to the model uncertainty assumption, including its simplified formulation
as well as the chosen parameter values. Either with the source location known
or unknown, release 2 has one of the best emission estimates among the six
releases, probably because the HYSPLIT forward model has the best performance
for the same release . The significant model errors when
simulating the transport and dispersion even with the exact source terms are
mostly caused by the meteorological uncertainties, while the HYSPLIT physical
schemes and parameters, as well as the numerical discretization, also
contribute.
An assumption made in this inverse modeling algorithm is that the differences
between model and observation have a normal distribution with a zero mean.
Figure shows the probability density function (pdf) of
ln(ch)-ln(co) for the six CAPTEX releases using the estimated release
rate q′ listed in Table . The pdf distribution of
ln(ch)-ln(co) for release 2 is consistent with the normal distribution
assumption, and the pdf for release 4 shows the largest deviation from a
normal distribution, while those for the other four releases resemble a normal
distribution to some extent. The largest relative error for release 1 is
likely related to the negative mean of the ln(ch)-ln(co) distribution
shown in Fig. . The overestimated q′ probably results from
the compensation of the model bias. Note that the better performance using
ln(ch)-ln(co) than ch-co is believed to be caused by the fact that
normal distribution assumption is mostly valid for the former but probably
invalid for the latter.
The meteorological field and the observations are the two major inputs to the
current inverse modeling. As discussed above, better model performance of
release 2 helps to lead to better inverse results than the other releases.
However, it is impossible to eliminate the model uncertainties. In practice,
ensemble runs can be used to quantify the uncertainties and reduce the model
errors by taking the average or median values of the ensemble runs. On the
other hand, increasing the number of observations is effective to improve the
inverse modeling results and reduce the result uncertainty. In principle,
when the release strength is the only value to be determined, each
measurement within the predicted plume can provide an independent estimate.
However, relying on a single observation to estimate the strength is
problematic since a particular model output can be very different from the
observation and thus lead to an erroneous estimation of the source
strength when used in isolation. For instance, although the HYSPLIT
predictions of release 2 with exact source terms are very good, compared with
individual measurements, they have severe underestimation (e.g., 0.77 pg m-3
predicted versus 686 pg m-3 measured), as well as significant
overestimation (e.g., 2033 pg m-3 predicted versus 31.2 pg m-3
measured). Therefore, similar to a regression technique, increasing the
sampling number can improve the final results, as exemplified by the very
good source term estimation for release 2 when using all the available
measurements. Also note that the samples outside predicted plumes do not
contribute to the inverse modeling. Table lists the total
measurement counts for each release, but the number of measurements actually
contributing to the inverse modeling are those inside the HYSPLIT plumes,
including those with zero or background concentrations. The number of such
effective measurements inside the plumes generated by HYSPLIT from the exact
source location and time period are reduced to 148, 237, 211, 68, 46, and 53,
for releases 1–5 and 7, respectively. The largest number of effective
measurements, 237, of release 2, also indicates the best performance of the
HYSPLIT simulation among those of the six releases. The effectiveness of the
measurements will change when source location or release time is changed. The
measurements that are not active in determining the source strength with a
known source location and release time may be effective to locate the source
locations.
The source location (latitude, longitude) and release rate
qmin identified by the minimal normalized cost function
Fmin for each CAPTEX release. A total of 121 candidate
locations are prescribed with 0.2∘ resolution in both longitude and
latitude directions, centered at (40.0∘ N, 84.5∘ W) for
releases 1–4, and at (46.6∘ N, 80.8∘ W) for releases 5 and
7. Δ is the distance between the point with Fmin and
the actual release site. q′ is the estimated release rate by assuming that
the actual release location is known. ϵq′ is calculated using
1(ϵq′)2=1(ϵqb)2+∑m=1M1(q′)2×(ϵmln(c))2, where ϵmln(c) is obtained
using Eq. (). For all of the cases, fo=20 %, ao=20 pg m-3, fh=20 %, and ah=20 pg m-3. Logarithm
concentration is taken as the metric variable.
Source location (latitude, longitude)
Δ(km)
Release rate (kg h-1)
No.
Actual
Estimated
Actual
qmin
q′
ϵq′
1
39.80∘, -84.05∘
41.0∘, -83.9∘
134.2
69.3
23.9
106.3
6.2
2
39.90∘, -84.22∘
39.8∘, -84.5∘
26.4
67.0
48.5
61.5
1.8
3
39.90∘, -84.22∘
40.8∘, -85.3∘
135.8
67.0
63.4
41.7
2.6
4
39.90∘, -84.22∘
40.2∘, -85.5∘
114.1
66.3
185.7
75.1
4.6
5
46.62∘, -80.78∘
46.2∘, -81.0∘
49.7
60.0
72.9
42.6
3.0
7
46.62∘, -80.78∘
47.4∘, -81.2∘
92.5
61.0
201.0
66.0
3.9
Distribution of 121 candidate source locations for release 2. The
minimal cost function at each location associated with an optimal release
strength is indicated by color. The cost function defined in
Eq. () is calculated with fo=20 %, ao=20 pg m-3,
fh=20 %, and ah=20 pg m-3. The actual source location,
Dayton, Ohio, USA, is shown as a red diamond.
Probability density function (pdf) of ln(ch)-ln(co) for the
six CAPTEX releases. Units of ch and co are pg m-3. The model
prediction ch is calculated using the estimated release rate q′ listed
in Table . ln(ch)-ln(co) is calculated when
both ch and co are non-zero. The number of data points used for pdf
calculation is 70, 184, 77, 49, 29, and 30, for releases 1–5, and
7, respectively.