Browsing by Author "Aristoff, David, advisor"

Now showing 1 - 3 of 3

Open Access
Data-driven methods for compact modeling of stochastic processes
(Colorado State University. Libraries, 2024) Johnson, Mats S., author; Aristoff, David, advisor; Cheney, Margaret, committee member; Pinaud, Olivier, committee member; Krapf, Diego, committee member
Stochastic dynamics are prevalent throughout many scientific disciplines where finding useful compact models is an ongoing pursuit. However, the simulations involved are often high-dimensional, complex problems necessitating vast amounts of data. This thesis addresses two approaches for handling such complications, coarse graining and neural networks. First, by combining Markov renewal processes with Mori-Zwanzig theory, coarse graining error can be eliminated when modeling the transition probabilities of the system. Second, instead of explicitly defining the low-dimensional approximation, using kernel approximations and a scaling matrix the appropriate subspace is uncovered through iteration. The algorithm, named the Fast Committor Machine, applies the recent Recursive Feature Machine of Radhakrishnan et al. to the committor problem using randomized numerical linear algebra. Both projects outline practical data-driven methods for estimating quantities of interest in stochastic processes that are tunable with only a few hyperparameters. The success of these methods is demonstrated numerically against standard methods on the biomolecule alanine dipeptide.
Open Access
Stability in the weighted ensemble method
(Colorado State University. Libraries, 2022) Lyons, Carter, author; Aristoff, David, advisor; Cheney, Margaret, committee member; Krapf, Diego, committee member
In molecular dynamics, a quantity of interest is the mean first passage time, or average transition time, for a molecule to transition from a region A to a different region B. Often, significant potential barriers exist between A and B making the transition from A to B a rare event, which is an event that is highly improbable to occur. Correspondingly, the mean first passage time for a molecule to transition from A to B will be immense. So, using direct Markov chain Monte Carlo techniques to effectively estimate the mean first passage time is computationally infeasible due to the protracted simulations required. Instead, the Markov chain modeling the underlying molecular dynamics is simulated to steady-state and the steady-state flux from A into B is estimated. Then through the Hill relation, the mean first passage time is obtained as the reciprocal of the estimated steady-state flux. Estimating the steady-state flux into B is still a rare event but the difficulty has shifted from lengthy simulation times to a substantial variance on the desired estimate. Therefore, an importance sampling or importance splitting technique that emphasizes reaching B and reduces estimator variance must be used. Weighted ensemble is one importance sampling Markov chain Monte Carlo method often used to estimate mean first passage times in molecular dynamics. Broadly, weighted ensemble simulates a collection of Markov chain trajectories that are assigned a weight. Periodically, certain trajectories are copied while others are removed, to encourage a transition from A to B, and the trajectory weights are adjusted accordingly. By time-averaging the weighted average of these Markov chain trajectories, weighted ensemble estimates averages with respect to the Markov chain steady-state distribution. We focus on the use of weighted ensemble for estimating the mean first passage time from A to B, through estimating the steady-state flux from A into B, of a Markov chain where upon reaching B is restarted in A according to an initial, or recycle, distribution. First, we give a mathematical detailing of the weighted ensemble algorithm and provide an unbiased property, ergodic property, and variance formula. The unbiased property gives that the weighted ensemble average of many Markov chain trajectories produces an unbiased estimate for the underlying Markov chain law. Next, the ergodic property states that the weighted ensemble estimator converges almost surely to the desired steady-state average. Lastly, the variance formula provides exact variance of the weighted ensemble estimator. Next, we analyze the impact of the initial or recycle distribution, in A, on bias and variance of the weighted ensemble estimate and compare against adaptive multilevel splitting. Adaptive multilevel splitting is an importance splitting Markov chain Monte Carlo method also used in molecular dynamics for estimating mean first passage times. It has been studied that adaptive multilevel splitting requires a precise importance sampling of the initial, or recycle, distribution to maintain reasonable variance bounds on the adaptive multilevel splitting estimator. We show that the weighted ensemble estimator is less sensitive to the initial distribution since importance sampling the initial distribution frequently does not reduce the variance of the weighted ensemble estimator significantly. For a generic three state Markov chain and one dimensional overdamped Langevin dynamics, we develop specific conditions which must be satisfied for initial distribution importance sampling to provide a significant variance reduction on the weighted ensemble estimator. Finally, for bias, we develop conditions on A, such that the mean first passage time from A to B is stable with respect to changes in the initial distribution. That is, under a perturbation of the initial distribution the resulting change in the mean first passage time is insignificant. The conditions on A are verified with one dimensional overdamped Langevin dynamics and an example is provided. Furthermore, when the mean first passage time is unstable, we develop bounds, for one dimensional overdamped Langevin dynamics, on the change in the mean first passage time and show the tightness of the bounds with numerical examples.
Open Access
Weighted ensemble: practical variance reduction techniques
(Colorado State University. Libraries, 2022) Johnson, Mats S., author; Aristoff, David, advisor; Cheney, Margaret, committee member; Krapf, Diego, committee member; Pinaud, Olivier, committee member
Computational biology and chemistry is proliferated with important constants that are desirable for researchers. The mean-first-passage time (MFPT) is one such important quantity of interest and is pursued in molecular dynamics simulating protein conformational changes, enzyme reaction rates, and more. Often, the simulation of these processes is hindered by such events having prohibitively small probability of observation. For these rare-events, direct estimation by Monte Carlo techniques can be burdened by high variance. We analyzed an importance sampling splitting and killing algorithm called weighted ensemble to address these drawbacks. We used weighted ensemble in the context of a stochastic process governed by a Markov chain (Xt)t≥0 with steady state distribution μ to estimate the MFPT. Weighted ensemble works by partitioning the state space into bins and replicating trajectories in an advantageous and unbiased manner. By introducing a recycling boundary condition, we improved the convergence of our problem to steady state and made use of the Hill relation to estimate the MFPT. This change allows relevant conclusions to be drawn from simulations that are much shorter in time scale when compared to direct estimation of the MFPT. After defining the weighted ensemble algorithm, we decomposed the variance of the weighted ensemble estimator in a way that admits simple optimization problems to be posed. We also defined the relevant coordinate, the flux-discrepancy function, for splitting trajectories in the weighted ensemble method and its associated variance function. When combined with the variance formulas, the flux-discrepancy function was used to guide parameter choices for choosing binning and replication strategies for the weighted ensemble algorithm. Finally, we discuss practical implementations of solutions to the aforementioned optimization problems and demonstrate their effectiveness in the context of a toy problem. We found that the techniques we presented offered a significant variance reduction over a naive implementation of weighted ensemble that is commonly used in practice and direct simulation by naive Monte Carlo. The optimizations we presented correspond to a reduced computational cost for implementing the weighted ensemble algorithm. We further found that our results were applicable even in the case of limited resources which makes their application even more appealing.