Weighted ensemble: practical variance reduction techniques

Johnson, Mats S., authorAristoff, David, advisorCheney, Margaret, committee memberKrapf, Diego, committee memberPinaud, Olivier, committee memberWeighted ensemble: practical variance reduction techniquesColorado State University. Libraries2022path-samplingweightedensembleweighted ensemblevarianceMy UniversityMy University2022-05-302022-05-302022engTexthttps://hdl.handle.net/10217/235230https://doi.org/10.25675/3.04269born digitalmasters thesesCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.Computational biology and chemistry is proliferated with important constants that are desirable for researchers. The mean-first-passage time (MFPT) is one such important quantity of interest and is pursued in molecular dynamics simulating protein conformational changes, enzyme reaction rates, and more. Often, the simulation of these processes is hindered by such events having prohibitively small probability of observation. For these rare-events, direct estimation by Monte Carlo techniques can be burdened by high variance. We analyzed an importance sampling splitting and killing algorithm called weighted ensemble to address these drawbacks. We used weighted ensemble in the context of a stochastic process governed by a Markov chain (Xt)t≥0 with steady state distribution μ to estimate the MFPT. Weighted ensemble works by partitioning the state space into bins and replicating trajectories in an advantageous and unbiased manner. By introducing a recycling boundary condition, we improved the convergence of our problem to steady state and made use of the Hill relation to estimate the MFPT. This change allows relevant conclusions to be drawn from simulations that are much shorter in time scale when compared to direct estimation of the MFPT. After defining the weighted ensemble algorithm, we decomposed the variance of the weighted ensemble estimator in a way that admits simple optimization problems to be posed. We also defined the relevant coordinate, the flux-discrepancy function, for splitting trajectories in the weighted ensemble method and its associated variance function. When combined with the variance formulas, the flux-discrepancy function was used to guide parameter choices for choosing binning and replication strategies for the weighted ensemble algorithm. Finally, we discuss practical implementations of solutions to the aforementioned optimization problems and demonstrate their effectiveness in the context of a toy problem. We found that the techniques we presented offered a significant variance reduction over a naive implementation of weighted ensemble that is commonly used in practice and direct simulation by naive Monte Carlo. The optimizations we presented correspond to a reduced computational cost for implementing the weighted ensemble algorithm. We further found that our results were applicable even in the case of limited resources which makes their application even more appealing.