Introduction
One of the most important challenges seismologists and earthquake engineers
face in designing large civil structures (e.g., buildings, dams, bridges, power
plants) and response plans, especially in highly populated cities prone to
large damaging earthquakes, is the reliable estimation of ground-motion
characteristics at a given location. Ground-motion prediction equations
(GMPEs), which are one of the most important elements for probabilistic
seismic hazard analysis (PSHA), are designed for this purpose. These are
obtained from regression analysis by fitting a dataset (empirical and
simulated) and are mainly expressed in terms of site conditions,
source–site distance (e.g., rupture distance or Joyner–Boore distance, denoted
as RJB distance hereafter), magnitude, and mechanism, although other terms such as
directivity and hanging-wall effect are also considered
. The equations can be derived for peak ground
displacement (PGD), peak ground velocity (PGV), peak ground acceleration
(PGA), and spectral acceleration (SA) for a damping of 5 % at different
periods. Ideally, an optimal GMPE has to be robust and include physical
terms to avoid overfitting the data, which can result in the inclusion of too
many parameters. When other effects are considered (such as amplitude and
duration of rupture directivity; ) or more data are
available , GMPEs are modified to better explain
attenuation patterns.
Many efforts have been made to characterize the seismic ground-motion
considering both real and simulated data. For example, using real data, five
research groups under the Pacific Earthquake Engineering Research Center Next
Generation Attenuation (PEER NGA) project derived GMPEs for shallow crustal
earthquakes considering an extensive database of recorded ground motion
. Later, obtained
GMPEs using both synthetic data and two subsets of accelerograms of the NGA
database . highlighted the
necessity to merge finite fault modeling with
observations to obtain GMPEs that better predict the amplitudes in zones
where data are insufficient. Verification and validation studies
were also conducted in a large effort to
understand ground motion and showed the importance of both accurate source
parameters and the geological description of the medium to reproduce observed
ground motion. improved the agreement between observed
ground motion and GMPEs by including site effects of the area. Numerical
simulations have also helped to explain ground-motion characteristics. For
instance, described attenuation patterns for both deep
in-slab and shallow interplate earthquakes, while explained
ground-motion amplification due to a volcanic layer.
modeled the decay of spectral amplitudes in several locations in North
America.
In this study, we investigate the level of complexity needed in kinematic
rupture models of magnitude 6.5 strike-slip events to produce
ground motion similar to a reference GMPE. To this end, we utilize the polynomial chaos (PC)
approach to build functional
representations of PGV responses of an original source model. Thanks to the
significant reduction in computational cost of the PC surrogate models (in
comparison with both the original source model and a Bayesian analysis based
on Markov chain Monte Carlo (MCMC) sampling, which requires a prohibitive number of model runs;
), it is suitable to utilize the PC surrogates in
a Bayesian inference framework . This
enables us to quantitatively rank different kinematic source models given by
the PGVs they produce and identify the most likely one that fits a chosen
reference GMPE (expectation). The ranking considers uncertainties in both the
GMPE and model parameters. This provides useful insight on the level of
complexity needed in kinematic source models for ground-motion simulations to
satisfy both observational constraints and engineering/design requirements
for seismic safety.
This paper is organized as follows. In Sect. , we provide
detailed descriptions of the source model configurations, including the
calculation of synthetic seismograms. In Sect. , we present the
PC analysis of PGVs as a function of random variations of the kinematic
models, including the validation of PC surrogate models and discussions of
various statistical quantities. In Sect. , we conduct a
PC-based Bayesian inference analysis to identify the most likely kinematic
rupture model that best fits a chosen GMPE reference curve. Finally, we
conclude our key findings and propose potential improvements for future work
in Sect. .
Source model
A magnitude Mw=6.5 strike-slip earthquake (seismic moment
6.31×1018 Nm; rake=0∘) on a single-segment
vertical fault plane is considered. The fault plane is chosen to be a
rectangle with fixed length L=27 km and width W=10 km, which are the
most frequent values among 100 sample realizations following scaling
relations (e.g., ). The
top of the fault plane is located 2 km below the ground surface.
Figure shows an example configuration of the fault
plane, in which the red star denotes the hypocenter and the ellipse is the
asperity with Gaussian slip distribution inside. The maximum slip
Smax is chosen such that the mean slip (over the entire fault
plane) remains constant (0.71 m) when varying the ellipse size (which
ensures that the moment magnitude remains constant, Mw=6.5). The
slip between the elliptical patch boundary and dashed rectangle
(Fig. ) is set to be Smax/e (where e is the
Euler number), the minimum value at the patch boundary from the Gaussian
slip distribution. The slip between the solid and dashed rectangles (the
horizontal and vertical gaps are 5 % of the length and width of the fault
plane, respectively) is tapered to avoid non-physical slip patterns. The
entire fault plane is discretized in along-strike and down-dip directions
with a grid size of 0.02 km. We use a regularized Yoffe function
with a rise time Tr=1.25 s following
source-scaling relations and slip acceleration time
tacc=0.225 s, as suggested by . At each node of
the discretized fault plane, we assign Tr, tacc, slip-rate in
along-strike and down-dip directions, and rupture time. We consider a rupture
speed of 0.75Vs (where the shear wave speed Vs is listed in Table 1) in all
source models.
Example of fault plane configuration. The red star denotes
hypocenter location, and the ellipse is the asperity with Gaussian slip
distribution inside. The slip distribution is tapered in the area between the
dashed and solid rectangles.
A virtual network of Nobs=56 stations where PGV responses
are reported by the source model. The solid black line at the center denotes
the length and location of the fault plane. Note that the 56 stations are grouped
into four sets (indicated by different colors/symbols) according to their RJB
distances (see details in Sect. ).
Velocity model used in this study, modified from
.
Depth (km)
Vp (km s-1)
Vs (km s-1)
0
2.4
1.5
0.5
4.4
2
1.5
5.3
2.7
2.5
5.5
2.9
4
5.7
3.3
8
6.1
3.5
14
6.8
3.9
16.6
7.1
4.1
27
8
4.6
350
8.2
4.65
PGVs at a virtual network of Nobs=56 stations
(Fig. ) are calculated from synthetic seismograms of the
two horizontal components of ground motion at each site for a large set of
source rupture models. We use COMPSYN , a code based on
the discrete wavenumber/finite element method proposed by
to calculate the synthetic seismograms up to a maximum frequency of 1.5 Hz
at each station of the virtual array. COMPSYN solves the equation of motion
considering initial conditions of zero displacement and velocity at a
reference time t0 and specifying traction or displacement on the bounding
surface of the medium (boundary conditions) using the unit outward normal
vector (details about the scheme can be seen in ). The
grid resolution used in COMPSYN is variable and uses a spacing of one-sixth of
the minimum shear wavelength at depth z. The grid extends to a total depth
that depends on the wavenumber, which means that the maximum depth decreases
when the wavenumber increases. This approach considers a layered 1-D
velocity structure. We apply the velocity model shown in
Table , which corresponds to a slightly modified version of
the generic model by for California. The
resulting PGVs serve as our quantities of interest (QoIs, each denoted as
Qj, for j=1,2,…,Nobs). We aim at understanding
stochastic source model PGV responses to random fault plane configurations of
the source process (slip distributions and hypocenter location). To this end,
we consider variations in seven physical parameters listed in
Table , which parameterize the fault plane configurations,
i.e., locations of both the hypocenter and elliptical asperity patch, as well
as its shape and orientation. We restrict the hypocenter and elliptical patch
to be inside the fault plane and limit the area ratio (AR) of the elliptical
patch to the entire fault plane (L×W) between 5 and 29 %. These
restrictions lead to nonlinear dependency between feasible ranges of
different physical parameters (see Appendix for
more details).
Parameters governing fault plane
configurations.
Index
Parameter
Physical interpretation
1
AR
Area ratio, AR=πabL×W∈ [0.05,0.29]
2
xh (km)
x coordinate of the hypocenter xh∈[-13.5,13.5]
3
zh (km)
z coordinate of the hypocenter zh∈[-5,5]
4
a (*) (km)
Semi-major axis a∈ AR⋅L⋅Wπ,L/2
5
θ (*)
Inclination angle of the elliptical patch
6
xc (*) (km)
x coordinate of the center ofelliptical patch
7
zc (*) (km)
z coordinate of the center ofelliptical patch
* denotes parameters whose feasible ranges are dependent on
others.
Polynomial chaos framework
PC expansions
are used in this
study to understand earthquake rupture model responses (in terms of PGVs) to
random configurations of slip distribution and hypocenter location. We
associate each of the physical parameters with a canonical PC random
variable ξi (i∈{1,2,.…,nd}, where nd=7 is
the stochastic space dimension) and assume all ξi values are independent
and uniformly distributed over [-1,1]. That is, the joint distribution of
the random parameter vector ξ is
p(ξ)=2-7ifξ∈Ξ≡[-1,1]7,0otherwise.
Each random parameter vector ξ∈Ξ can be linked uniquely
to a realization of the physical parameter vector (see mapping details in
Appendix ). We thus focus on constructing functional
representations of PGV responses at each station with respect to the
canonical variable ξ, which parameterize the physical parameters
in Table . It is worth mentioning that the mapping from
canonical random variable ξ to physical fault plane configuration
parameters does not lead to uniform distributions for physical parameters,
due to their interdependency as indicated in Table .
Nevertheless, the validity of PC expansions based on canonical random variable
ξ is well maintained, as suggested by the cross-validation and
empirical error analyses later in this section.
Let Qj(ξ) be the PGV response to ξ at the
jth station (j∈{1,2,…,Nobs}), and assume each
Qj is a second-order random variable, i.e.,
Qj(ξ) is in the Hilbert space L2(Ξ,p), and
EQj2=∫ΞQj(ξ)2p(ξ)dξ<+∞,∀j∈{1,2,…,Nobs}.
One can approximate Qj(ξ) using a truncated PC expansion
as follows:
Qj(ξ)≈Q̃j(ξ)=∑α=0NpcαΨα(ξ),∀j∈{1,2,…,Nobs},
where Np is a truncation parameter and (Np+1) is the
number of expansion terms retained in the PC surrogate models. In this study,
we truncated the PC expansion at total polynomial order of q=9. Given
nd=7, one can calculate the total number of polynomials via
Np+1=(q+nd)!q!nd!=11440.
By adopting the classical convention of Ψ0(ξ)=1, the
mean and variance of a PC surrogate Qj(ξ) can be
expressed as
EQ̃=∑α=0NpcαΨα,1=c0,
and
VQ̃=EQ̃-EQ̃2=∑α,β=1NpcαcβΨα,Ψβ=∑α=1Npcα2‖Ψα‖L22,
where ⋅ denotes the inner product in the Hilbert space
L2(Ξ,p) with respect to the joint distribution p(ξ)
.
To determine the expansion coefficients (cα values) in
Eq. (), we rely on a Latin hypercube sample (LHS)
set (denoted as PLHS hereafter) of
NLHS=8000 earthquake rupture model realizations through
{ξk}1≤k≤NLHS, and solve the following basis
pursuit denoising (BPDN) problem at each station:
c*=argminc∈RNp+1||c||l1s.t.||Qj-[Ψ]c||≤γ||Qj||l2,∀j∈{1,2,…,Nobs},
where Qj=(Qj(ξ1),Qj(ξ2),…,Qj(ξNLHS))T is the model PGV realization
vector at the jth station, and c∈RNp+1 is
the coefficient vector for the corresponding PC surrogate model.
[Ψ]∈RNLHS×(Np+1) denotes
the polynomial matrix with each element
[Ψ]i,α=Ψα(ξi). Note that
[Ψ] is station invariant. The scalar parameter γ
indicates the model noise level and is determined numerically via a k-fold
(k=5) cross-validation process over a discrete grid
of γ={10-4,10-3,10-2:0.005:0.2}.
Following , , variance-based first-order
and total-order sensitivity indices associated with a subset of random
variables (i⊂{1,2,…,nd}) can be calculated,
respectively, as follows:
Si=∑α∈Sicα2‖Ψα‖L22∑α=1Npcα2‖Ψα‖L22,
Ti=∑α∈Ticα2‖Ψα‖L22∑α=1Npcα2‖Ψα‖L22,
where Si (first-order sensitivity) is the relative
variance contribution of those polynomials (denoted as index set
Si) exclusively related to random variables in the
subset i, while Ti (total-order sensitivity) is
the relative variance contribution of polynomials (denoted as index set
Ti) involving any of the random variables in i
(including cross polynomials between variables in i and its
complement i∼, i∪i∼={1,2,…,nd}). Note that by definition the two polynomial index
sets satisfy Si⊂Ti.
Validation of PC models
We first validate our PC surrogate models for PGVs at all stations. To this
end, we introduce a second independent source model simulation ensemble
(again an 8000-member LHS set PLHSvalid⊂Ξ) for the purpose of validation. (Note that
PLHSvalid is independent of the training set
PLHS.) The following relative l2 error is then
examined for PGVs at each station.
ϵj=∑k=1NLHS(Q̃j(ξk)-Qj(ξk))2∑k=1NLHSQj(ξk)2,∀j∈{1,2,…,Nobs},
where Q̃j(ξk) and Qj(ξk)
denote PC and source model responses, respectively, to ξk at the
jth station. ξk∈PLHS or ξk∈PLHSvalid depending on the sample set used to
estimate the errors.
Relative l2 errors of PC surrogate models. The cross-validation
errors are close to the error estimated from validation set. For brevity, we
omit the cross-validation errors in the plot.
Figure shows error estimates of PC surrogate models over the
training set (PLHS, blue dots) and the validation set
(PLHSvalid, red dots). It is not surprising to
see slightly larger error estimates on the validation set, as the PC
reconstruction process is unaware of this dataset. However, because almost
all error estimates fall below 10 % range, and in light of the close
agreement (about 4 % difference) between the blue and red dots, our PC
surrogate models are deemed to suitably reproduce source model PGV responses
throughout the entire station network.
PC-predicted PGV distributions at two selected stations (as
indicated in Fig. ). Distribution curves are obtained using
kernel density estimation from PC realizations
on a 1 million member LHS set PLHS1E6.
Apart from the above error estimates, the convergence of PC surrogate models
with respect to truncation order is also investigated from a statistical
point of view. Figure shows PGV distributions from PC
resampling on a 1 million member LHS set (PLHS1E6) at two selected stations, with increasing odd PC truncation orders up
to degree 9. It is seen that when the truncation order is larger than 5,
the difference in the PGV prediction distributions becomes relatively small,
suggesting that our ninth-order PC expansions are sufficiently accurate for
the source model under consideration.
Comparison of PGV distributions predicted by the source model (blue
solid curve) and PC surrogate model (red dashed curve), respectively, at
selected stations (as indicated in Fig. ) over the
validation sample set PLHSvalid. The black
dash-dotted curves are PGV prediction distributions obtained from PC
surrogate model realizations on a 1 million member LHS set
PLHS1E6. Distributions are obtained using
kernel density estimations .
We finally compare distributions of PC and source model predictions; see
Fig. . It is observed that our PC surrogate models are
capable of reproducing PGV distributions produced from source model
realizations of the validation set PLHSvalid.
Besides, the excellent agreement between the two PC-predicted distribution
curves in Fig. suggests that our existing
8000-model-simulation ensemble is statistically representative, which provides
additional confidence in our PC representations.
PC statistics
The PC surrogate models obtained in the previous section provide immediate
access to prediction statistics, as given by Eqs. ()
and (). Figure shows means and standard
deviations of PC PGV predictions at different stations, along with a
reference median PGV curve predicted by the GMPE in
. It is
noted that two stations with similar RJB distance can have very
different PGV values. This is likely due to radiation-pattern effects, in
particular directivity, which is addressed in great detail by
. Besides, it is observed that PC predictions
generally scatter around the GMPE curve. Though one should not expect an exact
match between PC statistic and GMPE predictions, due to the difference in
random sources underlying the two approaches, and the uninformative uniform
canonical PC parameter distributions used to generate PC statistics, it is
worth noting that the similar range of PC and GMPE predictions enables us to
use the GMPE results as “observations” for the purpose of parameter
inference discussed in Sect. . One can also observe that PGVs
are generally largest near the fault plane and decrease with increasing
RJB distance. The overall tendency of PC prediction uncertainty
(quantified by the standard deviation bars) seems to decrease with increasing
RJB distance as well.
Comparison of PC statistics (based on uniform distribution
assumption of the canonical PC random parameters) with GMPE results. Solid
black curve denotes the median GMPE prediction, while the dashed lines are
GMPE standard deviation bounds. Note that log scales are used in the plot.
The conditional mapping from canonical PC random variables (ξ) to
physical fault plane configurations makes it difficult to isolate the
relative impact of individual parameters. To address this difficulty, we rely
on the global sensitivity analysis and discuss
the statistical significance of each canonical random parameters in the
rupture model.
First- (a) and total-order (b) sensitivity indices
at each station.
Figure shows both the first- and total-order sensitivity
indices associated with each random parameter at different stations. These
sensitivity indices reveal that the model PGV response is most sensitive to
the location of the hypocenter (xh is dominant and zh
plays a secondary role) throughout all stations, whereas the remaining random
parameters (associated with elliptical asperity patch) are relatively
insignificant. While it might be reasonable to neglect the elliptical patch
parameters' impact on PGV response variability at far stations (with
RJB distance roughly more than 10 km away from the center), it is
evident that at near-the-center stations, those elliptical patch parameters
can still lead to a considerable impact on PGV response.
First-order sensitivity indices with respect to grouped parameters.
To better illustrate the above sensitivity observation, we divided the
parameters into the following two groups (ξhypo={ξ2xh,ξ3zh} and
ξellip={ξ1AR,ξ4a,ξ5θ,ξ6xc,ξ7zc}; the superscripts denote the
corresponding physical parameters) and calculated the first-order sensitivity
indices associated with ξhypo and ξellip
using Eq. (), denoted as Shypo and
Sellip, respectively. Note that the combined effect
(interaction) of hypocenter location and elliptical patch parameters is
simply given by Shypo×ellip=1-Shypo-Sellip. The resulting group
sensitivity indices are shown in Fig. . It is now clear that
the hypocenter location alone is responsible for 80–90 % of the
variability in PGVs at distant stations. Meanwhile, near the center, the
hypocenter location alone is associated with only 55–75 % of the PGV
variability, suggesting that the elliptical patch parameters play important
roles with about 25–45 % contribution to the total PGV variability.
Bayesian inference
In this section, we utilize a Bayesian approach
to find the most likely fault
plane configuration, in the sense that the resulting earthquake rupture model
produces PGVs that best match the reference GMPE curve for the same magnitude and
focal mechanism . To this end, we first obtain the
GMPE-predicted PGVs at the stations shown in Fig. , denoted as
d, which serve as observational data in our Bayesian inference, and
compare d with our PC surrogate model predictions
d̃(ξ)=(Q̃1(ξ),Q̃2(ξ),…,Q̃Nobs(ξ))T.
Bayesian formulation
To formulate the Bayesian problem, we start with Bayes' formula:
p(η|d)=p(d|η)p(η)p(d)∝p(d|η)p(η),
where η is the parameter vector to be inferred, p(η) is
the prior probability distribution of η, and
p(d|η) is the likelihood of observing d given
η. The denominator p(d) is the marginal distribution known
as evidence. (Note that this evidence can be neglected, as the MCMC sampling method utilized below
solely relies on the proportionality.) We adopt the assumption of independent
Gaussian error at each station location; i.e., the discrepancy between
observations (GMPE-predicted PGVs) and PC predictions at each station is an
independent Gaussian variable:
p(ϵj)=p(dj-d̃j)=12πσ2exp-(dj-d̃j)22σ2,∀j∈{1,2,…,Nobs}.
Recall that the PC prediction variability seems to decrease with
RJB distance according to Fig. . To account for
this decay of PGV variance with RJB distance in the Bayesian
inference analysis, we partition the Nobs stations into four
groups according to their corresponding RJB distances as indicated
in Fig. , and associate each group of stations with a
hyperparameter σl(j)2 (l(j)∈{1,2,3,4} depending on the
RJB distance of the jth station). As a result, the likelihood can
be expressed as
p(d|η)=∏j=1Nobs12πσl(j)2exp-(dj-d̃j(ξ))22σl(j)2,
and accordingly the inference parameter vector η reads
η=(ξ1,ξ2,…,ξ7,σ12,σ22,…,σ42)T.
Our numerical experiments suggest that the 4-σ2 model above
outperforms the model with only one hyperparameter for all stations. It is
noted that we limit the number of uncertainty hyperparameters
(σi2 values) to four in this study, due to the limited number of
observations (PGVs at limited number of stations). If more observations were available, it might be beneficial to increase the number of hyperparameters.
The prior distribution of η, without additional information on the
model parameters, is usually given by assumptions of uniform distribution for
canonical PC parameters ξ, and Jeffrey's priors
for hyperparameters σl2 (as σl2 is always greater than
zero); consequently,
p(η)=127∏l=141σl2∀ξ∈Ξand∀σl2>0,0otherwise,,
and Bayes' rule reduces to
p(η|d)∝p(d|η)p(η)=∏j=1Nobs12πσl(j)2exp-(dj-d̃j(ξ))22σl(j)2∀ξ∈Ξand127∏l=141σl2∀σl2>0,0otherwise.
We rely on the adaptive metropolis MCMC approach
to sample the above posterior distribution. It
is worth noting that MCMC methods, despite the improved efficiency against
traditional MC approaches, generally require a large number of samples
(typically tens of thousands, and even larger depending on the dimensionality
of the problem). This is one of the main reasons why we utilize PC
techniques, as the use of the corresponding PC surrogates in the MCMC
simulation leads to significant reduction in computational cost. In this
study, the MCMC sample size for inference is set to 106.
Inference results
Posterior probability distributions of prediction uncertainty
parameters (each PDF curve is scaled to have unit peak height for better
comparison).
As mentioned above, we exploit the PC surrogate models in Bayesian inference
analysis and update the posterior distribution of random parameters
(ξ∈Ξ), as well as PGV prediction uncertainties
(σl2 values), in light of the GMPE observations.
Figure shows the posterior probability distributions of
hyperparameters σl2 (l∈{1,2,3,4}). It is evident that
σl2 decreases with RJB distance (from l=1 to l=4),
which supports our previous ansatz from Fig. .
Similarly, we examine the sampling chains of PC random parameters ξi (i∈{1,2,…,7}). While some parameters (e.g., ξ1,ξ2,ξ3, and ξ6) yield very informative posterior distributions (not shown here),
others look relatively less informative. It is noted that our goal is to
estimate the posterior distributions of the physical parameters in
Table , instead of the PC parameters. Thus, it is desired
to map the ξ chain into the corresponding physical configuration
chain, before inferring the most likely fault plane configuration.
Prior (dashed black, derived from uniform ξ distribution
in Ξ) and posterior (solid blue) distributions of physical fault
plane configuration parameters. The bottom right panel shows the inferred
fault plane configuration.
Figure shows the posterior distributions of the physical
parameters after mapping from the PC parameter chain of ξ, as well
as the corresponding inference of the fault plane configuration (bottom right
panel). It is observed that in light of the GMPE PGV observations (1) the
hypocenter location (xh and zh) is well identified;
(2) the size of the elliptical patch seems to be more likely near the lower
bound of the prior; (3) the inclination angle of the elliptical patch, as
well as the location of the patch, are less conclusive. For example, despite
the clear peak in the inclination angle plot, the posterior distribution is
relatively flat, suggesting limited information gain compared with the prior
knowledge. Furthermore, the xc distribution only shows the fact
that the ellipse tends to be in the left half of the fault plane; the
definite location of the elliptical patch (either xc or
zc) is ambiguous. These findings are generally consistent with the
results of the sensitivity analysis. Since the model is primarily sensitive
to the hypocenter location, perturbing the hypocenter location leads to more
effective adjustment in PGV responses. On the other hand, elliptical patch
parameters have relatively small impact on PGV variance, which calls for more
observational data to pin down those parameters.
One needs to be cautious about the Bayesian inference results discussed
above. From the physical point of view, the spatial distribution of those
stations (see Fig. ) where PGVs are reported is almost
“symmetric” about the center of the fault plane (x=0 and y=0); as a
result, one would expect to see a “symmetric” twin configuration that is
roughly equally plausible from the Bayesian inference. However, this
“symmetric” counterpart is clearly missing in the above inference results.
This is probably because when MCMC chain converges to the high probability
region of hypocenter location in the bottom right quadrant of the fault
plane, it becomes more and more difficult to escape from this high
probability region and explore the other side of parameter space. In other
words, there could be bimodal structures in the distributions of
xh (as well as xc) which the previous MCMC process fails
to identify (e.g., the configuration in which the hypocenter is located in the
bottom left quadrant of the fault plane, and the ellipse is centered at
somewhere on the right half of the fault plane). While in theory it is
possible to identify the missing multimodal distributions of random
parameters by further increasing the number of MCMC samples, the
computational cost can be excessive. Alternatively, we verify our expectation
of seeing the “symmetric” counterpart configuration by rerunning the MCMC
simulation starting with the “symmetric” counterpart configuration (i.e.,
with hypocenter being in the bottom left quadrant of the fault plane, and
elliptical patch being on the right side of the fault plane). The resulting
fault plane configuration inference is shown in Fig. . As
expected, the new MCMC process ended up with a fault plane configuration that
is roughly “symmetric” to the previous inference result, especially for the
hypocenter location. The asymmetric behavior of the elliptical patch stems
from the fact that (1) the Nobs stations are not exactly
symmetrically distributed; thus, one should not expect exact symmetry; (2) as
discussed before, the PGV responses are less sensitive to the elliptical
patch properties, leading to ambiguity in inferring these properties.
Inferred fault plane configuration with MCMC chain starting from the
“symmetric” counterpart configuration.
Inference with restricted prior
The previous inference results are all based on almost complete ignorance of
dependency between hypocenter location and the slip area (asperity). However,
previous studies suggested some constraints
on the relative hypocenter location with respect to the
asperity, and size of the asperity .
In this section, we consider the following restrictions in our inference
analysis:
R-1.
The elliptical patch is inside the dashed rectangle
[L′,W′]=0.9×[L,W] shown in Fig. .
R-2.
The area ratio of the elliptical patch (AR) is between
15 and 29 % of the fault plane area, i.e., 0.15<AR<0.29.
R-3.
The elliptical patch is not too elongated, i.e., the axis
ratio ab≤3.
R-4.
The hypocenter is located outside but near the elliptical
patch, i.e., xh=(a+3ζh1)cos(2πζh2)
and
zh=(b+b3aζh1)sin(2πζh2)
∀(ζh1,ζh2)∈[0,1]2.
One of the advantages of having previous PC surrogate models (which were
built based on uninformative prior that spans a wide range of feasible
scenarios, i.e., minimal restrictions as in Table ) is that
the above four additional parameter restrictions can be efficiently performed
a posteriori, namely without the need for performing new model simulations
.
To begin with, we first incorporate the above restrictions into the
Bayesian framework, namely by modifying the previous prior distribution
(Eq. ) as follows:
p*(η)=127∏l=141σl2∀ξ∈Ξ,∀σl2>0and all restrictions are satisfied,0otherwise.
However, due to the strong restrictions listed above, the support of the
above prior probability distribution (Eq. ) turns out to be
extremely limited in the parameter space Ξ, leading to
computationally inefficient MCMC sampling (since most of the samples drawn
from a proposal distribution will end up not satisfying at least one of the
restrictions and thus zero prior probability). To mitigate the difficulty of
inefficient sampling due to restricted prior distribution, we introduce a new
layer of parameterization, mapping from Ξ to restricted physical
configurations. (Details on this new mapping mechanism are given in
Appendix .)
Flow chart demonstrating the random sampling process and the
calculation of posterior probability in MCMC. The orange path corresponds to
unrestricted sampling process, whereas the blue path incorporates additional
restrictions on fault plane configurations. Note that Y denotes the fault
plane configuration vector in the physical domain, e.g., Y=(AR,xh,zh,a,θ,xc,zc)T.
Figure shows the MCMC process of drawing random samples
from proposal distributions and calculates the resulting posterior
probability. Without additional restrictions (orange path), the parameter
vector ζ=ξ, and the whole process reduces to the standard
MCMC process we used in the previous section. By introducing the new
parameterization process (see Algorithm ), we are
transforming the original problem, which is based on PC parameter vector
ξ, into a new inference problem based on ζ (we denote
ζ as the auxiliary random parameter vector hereafter, to distinguish
it from the PC parameter vector ξ). This transformation is based on
the mapping from ζ to ξ (i.e.,
ξ=ξ(ζ)) via their commonly associated physical
configuration. For clarity, we formulate the new ζ based Bayesian
problem as follows:
p(η*|d)∝127∏l=141σl2∏j=1Nobs12πσl(j)2exp-(dj-d̃j(ξ(ζ)))22σl(j)2∀ζ∈Ξ,∀σl2>0,0otherwise.
where η*=(ζ1,ζ2,…,ζ7,σ12,σ22,…,σ42)T.
Prior (dashed black, derived from uniform ζ distribution
in Ξ) and posterior (solid blue) distributions of physical fault
plane configuration parameters in restricted inference. The bottom right
panel shows the inferred fault plane configuration.
Following the same analysis as discussed before, we show the inference
results under restrictions in Fig. . Note that
the prior distributions of those physical parameters are different from those
in Fig. , as the new ones are derived from uniformly
distributed auxiliary random vector ζ∈Ξ, instead of
PC parameters ξ∈Ξ. Nevertheless, we see very consistent
results of hypocenter location, as well as the location of the elliptical
patch, compared with those in Fig. . The area ratio AR,
though larger than the previous inferred value, still favors the lower end of
the prescribed parameter range. The elliptical patch ends up with a larger
area and longer semi-major axis (compared to the results in
Figs. and ). These differences are
directly stemming from restrictions R-2 and R-3.
Restricted Bayesian MCMC sample chains of the
hypocenter (a) and elliptical patch center (b);
panel (c) shows the correspondence between xh and xc
chains.
Though it is not obvious to see from Fig. , the
restricted Bayesian MCMC process is indeed aware of the existence of the
“symmetric” counterpart configuration. Figure shows
the restricted Bayesian MCMC sample chains of both the hypocenter (Fig. a)
and elliptical patch center (Fig. b). It is seen that despite the fact the
hypocenter samples are mostly clustered around xh=5 km, there is a
sample cloud on the opposite side (xh=-5 km), corresponding to the
“symmetric” counterpart configuration discussed before. The sample cloud of
the elliptical center also shows bimodal distributions, with primary cloud on
the left (xc<0) and secondary “symmetric” counterpart on the
right (around xc=5 km). The correspondence between xh
and xc is shown in Fig. c, from which it is
seen that when xh is positive, xc is more likely to be
negative, and vice versa, suggesting that hypocenter and ellipse center are on
the opposite side of the fault plane, as previous inference results
suggested. Note that in this restricted Bayesian MCMC sampling, the total
number of samples remains 106. The ability to observe the “symmetric”
counterpart clouds is probably due to the fact that by introducing the
auxiliary parameter ζ, we dramatically shrunk the sampling space
(it is only a small subspace of the original unrestricted parameter space).
As mentioned before, introducing the auxiliary parameter ζ leads
to significant efficiency improvement in the MCMC sampling process.
Comparing PGVs
Comparison of PC-predicted PGV responses with aforementioned three
inferred fault plane configurations with the reference GMPE curve. Dashed
lines are standard deviation bounds of GMPE predictions.
Comparison of PC-predicted PGVs of different inferred configurations
with the reference GMPE curve. Unrestricted-1 and -2 correspond to inferences
in Figs. and , respectively. Each inferred configuration leads to PGV predictions in Fig. 15, as indicated by different colors.
Inference
ϵ=∑j=1Nobs(Q̃j-QjGMPE)2Nobs
r=1Nobs∑j=1NobsQ̃j-QjGMPEQjGMPE2
Unrestricted-1 (blue)
1.1135
0.3395
Unrestricted-2 (red)
1.7413
0.3993
Restricted (green)
1.4564
0.3702
We summarize the Bayesian analysis by comparing PC-predicted PGV responses to
the three inferred fault plane configurations discussed above with the
reference GMPE curve (see Fig. and
Table ). We observe that all three configurations lead to
a relatively close match between PC predictions and the reference GMPE curve.
By comparing either the root mean square (rms) error or the relative rms
error (see Table ), we conclude that the red dots
(corresponding to the unrestricted inference in Fig. )
clearly show larger discrepancy from the GMPE curve, suggesting smaller
likelihood compared to the other two, consistent with our Bayesian analysis.
When comparing the blue and green dots (unrestricted inference in
Fig. versus restricted inference in
Fig. ), the former seems to be slightly better,
which is expected because of the additional flexibility in fitting the GMPE
curve. Nevertheless, it might be better to report the restricted inference
results (configuration in Fig. ), as it
satisfies all the restrictions learned from previous studies while retaining
plausible agreement with the reference GMPE curve.
Conclusions
An earthquake rupture model was adopted to explore the stochastic dependence
of ground motion (in terms of PGVs) on random fault plane configurations.
Thanks to the ability to generate two independent source model simulation
ensembles with 8000 members each, we were able to build successful PC
surrogate models to assess PGV responses over the virtual network of
Nobs=56 stations from one ensemble, and then to validate the
quality of PC models on the other. Our statistical analysis showed that the
two 8000-member LHS ensembles of source model simulations are adequate to
represent the underlying PGV distributions at all stations, as they closely
match with PC-predicted distributions over a much larger sample set.
A global sensitivity analysis of PC surrogate models was conducted. The
analysis revealed that the source model PGV response is primarily sensitive
to the hypocenter location, and much less sensitive to properties of the
asperity patch, especially at stations far away from the fault plane (in
terms of the RJB distance). While this holds true for all stations,
it is noted that asperity patch properties still carry considerable impact
(20–30 % associated variability) on PGV responses at stations close to
the fault plane, and even more influence (additional 10 % variability) if
one takes into consideration the interaction between asperity patch and
hypocenter location.
Our analysis of PGV variabilities indicated that one needs to be cautious
when interpreting PGVs at near-fault-plane stations, as they are more prone
to higher model noise. This is supported by the Bayesian inference analysis,
in which four independent model noise parameters (σl2 for
l=1,2,3,4) were introduced and assigned to four groups of observational
stations, depending on their RJB distances away from the fault
plane. The Bayesian inference results clearly showed the decreasing trend of
noise parameters (σl2 values) when moving away from the fault plane (see
Fig. ). Further refinement of the noise parameter profile
along the RJB distance, though desired, is prohibited by the
limited number of available observational stations.
We conducted both unrestricted and restricted Bayesian inference analyses to
identify the chosen GMPE reference curve. The key findings are as follows.
(1) Given the station distribution (Fig. ) in this study,
it is more likely to have the hypocenter located in the lower right quadrant
of the fault plane, and the elliptical patch centered in the lower left
quadrant. (2) Due to the considerable “symmetry” presented by those
Nobs stations, the most profound fault plane configuration, which
best reproduce the reference GMPE predictions, can potentially have a
“symmetric” twin configuration, especially for the hypocenter location.
(3) The restricted inference results remain consistent with the unrestricted
ones, with slightly more deviation from the chosen GMPE reference curve.
(4) Most importantly, our analyses suggest that the hypocenter and slip patch
cannot be in the near-surface area of the fault, and they need to be some
distance away from each other in order to produce the proper seismic
radiation pattern, including on-fault directivity. Otherwise, the resulting
near-source waveforms, and hence PGVs, would not match with GMPE results.
This is consistent with the findings of .
The analyses and findings in this study provide useful insights on how
near-source ground shaking (and its variability) depends on random fault
rupture configurations. Interestingly, even very simple source models (with
elliptical slip patches) are able to generate shaking distributions that well
reproduce empirical predictions. To better reproduce the chosen GMPE
reference curve, it might be beneficial to consider two or more asperity
patches, instead of one in this study, in order to reduce the hypocenter
location influence and in return increase the impact of asperity properties.
Another potential improvement can be made by refining the station network. As
mentioned earlier, the Bayesian inference is primarily limited by the number
of available stations at which PGVs are reported. By increasing the number of
PGV reporting stations, one may improve the Bayesian inference results
(e.g., removing the ambiguity in inferring the elliptical patch
location).