Thursday, August 13, 2015

Kinsol logs and troubleshooting slow models

by Laura Condon
Keywords: watershed model, getting started, how to, troubleshooting

Manual Location
Running ParFlow: 3.2
Richards’s Equation Solver Parameters: 6.1.33
ParFlow Solvers: 2.3
Annotated input scripts: 3.6
Directory of test cases: 3.5
Overland Flow Equations: 5.4

This post gives some tips for where to start if your model is running but it isn’t solving well. 

The best way to monitor the progress of your runs is by looking at the kinsol.log file that will be output in your run directory. If you don’t have a kinsol.log file then your model isn’t actually running and you have somehow miss-specified your inputs. This blog post lists some common things to check if your model doesn’t run.  

Assuming that your model has started running, the first thing to do is see what’s going on with the solver.

Looking at your kinsol.log file:
The outputs for a typical time step in the kinsol.log file should look something like this:

KINSOL starting step for time 1.000000
scsteptol used:        1e-30
fnormtol  used:        1e-06
KINSolInit nni=    0  fnorm=          2859.631546316827  nfe=     1
KINSol nni=    1 fnorm=          2798.943588424358 nfe=     5
           KINSol nni=    2 fnorm=          1264.208176165057 nfe=     8
KINSol nni=    3 fnorm=          1221.629858300943 nfe=    13
KINSol nni=    4 fnorm=          903.7712374454168 nfe=    15
KINSol nni=    5 fnorm=           408.050165635947 nfe=    18
KINSol nni=    6 fnorm=          363.6154118376153 nfe=    26
KINSol nni=    7 fnorm=          4.635468210055366 nfe=    27
KINSol nni=    8 fnorm=        0.06076322632612002 nfe=    28
KINSol nni=    9 fnorm=      5.862763215189762e-05 nfe=    29
KINSol nni=   10 fnorm=      7.663997383858169e-07 nfe=    30
KINSol return value 1
---KINSOL_SUCCESS

--------------------------------------------------
                            Iteration             Total
Nonlin. Its.:              10                10
Lin. Its.:                   121              121
Func. Evals.:             30                30
PC Evals.:                 10                10
PC Solves:                131              131
Lin. Conv. Fails:         0                 0
Beta Cond. Fails:       0                 0
Backtracks:               0                 0
--------------------------------------------------

For every nonlinear iteration (nni) this is listing the residual (fnorm) and the number of function evaluations (nfe) that were completed within that iteration. If the model is solving well, you should see the fnorm quickly decreasing. Once the fnorm falls below the nonlinear residual tolerance set in your tcl script (Solver.Nonlinear.ResidualTol), ParFlow will consider the solution converged and will declare ‘KINSOL_SUCCESS’ and summarize the work done in that time step. 

When you are first starting a simulation its normal for the model to require a lot of iterations to converge.  Usually, you should see the number of nonlinear iterations (Nonlin. Its.), linear iterations (Lin. Its.) and function evaluations (Func. Evals.) per time step decreasing as the model gets going.

Sometimes though, ParFlow will be unable to solve a time step. This will happen if: 
  • The fnorm has not dropped below the nonlinear residual tolerance (Solver.Nonlinear.ResidualTol) within the maximum allowed number of nonlinear iterations (Solver.Nonlinear.MaxIter). In this case you will get the following message: ‘KINSOL_LNSRCH_NONCONV’. 
  • The fnorm has stagnated and the difference in fnorm between two consecutive nonlinear iterations is less than the step tolerance (Solver.Nonlinear.StepTol).  In this case you will get the following message: ‘KINSOL_STEP_LT_STPTOL  (stagnating)’

If ParFlow can’t solve a time step then it will try cutting it in half and solving the half step instead. However, if the maximum number of convergence failures (Solver.MaxConvergenceFailures) has been exceeded then the model will stop running completely.

This is a very brief summary of the kinsol.log file for more detailed information you can refer to the kinsol manual.

What to do if your model isn’t converging:
If your model isn’t solving well or is solving really slowly then your first step should be to take a look at your inputs and the latest outputs (assuming it has made it through at least one time step) and look for any obvious issues. Some good things to check are:

  • All of your inputs have the same dimensions and your vertical layering is correct (i.e. when you formatted your input files for ParFlow you put everything in the correct order according to Chapter 6 of the manual). A quick way to do this is by looking at your silo files using VisIt or another visualization software.
  • Check that the range of values for all of your inputs is correct and that they are all in the consistent length and time units you are using for your simulation.
  • Look at your pressure file and see if you have areas of very high or very low pressure. ParFlow’s default equation for overland flow uses the kinematic wave approximation (see manual section 5.4) which means that if you have surface cells that don’t drain, any ponded surface water will build up and this can start to slow down the solver. For tips on terrain processing to make sure your domain fully drains refer to this blog post. 
  • Look at your pressure file and see if you have a lot of places where surface water flow is just starting to form (i.e. you have pressure heads in the top cell that are close to zero). This isn’t a problem you need to fix but it does explain why your model is solving slowly because when surface water flow is turning on and off the problem becomes much more difficult to solve.

If you are satisfied from these checks that you don’t have any errors in how you set things up then you have a couple of options to speed up the simulation:
  • Changing your time step is often helpful at the beginning of a simulation to get things started. Sometimes you many need a smaller time step at first and once things start solving smoothly you can up it again.
  • Changing to the non-symmetric preconditioner can improve performance by incorporating the full Jaobian matrix (as opposed to just the symmetric parts). You can set this as follows:
         pfset Solver.Linear.Preconditioner.SymmetricMat Nonsymmetric
  • Also, if it looks like the solver is making progress but just isn’t getting there fast enough you can adjust your solver settings to try to help it along. Some common changes are to increase the number of nonlinear iterations allowed (Solver.Nonlinear.MaxIter) or increase the tolerance for convergence (Solver.Nonlinear.ResidualTol). Refer to the manual section 6.1.33 for a more detailed list of solver settings and options. For examples of solver settings refer to the annotated input scripts in the manual (section 3.6) and the example problems in the test directory included in your ParFlow install (refer to manual section 3.5 for a summary of the test cases included in this directory). However, you should be careful about changing your solver parameters just to get convergence because this could be masking underlying problems. ParFlow has powerful solvers and if it’s getting stuck this can be an important indication that you have some large discontinuities somewhere and something isn’t right in your model.
  • You can also decrease your run time by increasing the number of processors you are running on. Note, that this will only help you if your model is actually solving it will do nothing to fix solutions that can’t converge. For, details on the computational scaling performance of ParFlow refer to this blogpost and the references listed below.

NOTE: If you find the terminology in this blog post hard to follow it might be helpful to do some background reading on numerical methods for partial differential equations. 

References:
Beisman, J.J., Maxwell, R.M., Navarre-Sitchler, A.K., Steefel, C.I., and Molins Rafa, S. ParCrunchFlow: An Efficient, Parallel Reactive Transport Simulation Tool for Physically and Chemically Heterogeneous Saturated Subsurface Environments. Computational Geosciences, 19(2), 403-422, doi:10.1007/s10596-015-9475-x, 2015.

Osei-Kuffuor, D., Maxwell, R.M. and Woodward, C.S. Improved Numerical Solvers for Implicit Coupling of Subsurface and Overland Flow. Advances in Water Resources, 74, 185-195, doi:10.1016/j.advwatres.2014.09.006, 2014.

Maxwell, R.M. A terrain-following grid transform and preconditioner for parallel, large-scale, integrated hydrologic modeling. Advances in Water Resources, 53:109-117, doi:10.1016/j.advwatres.2012.10.001, 2013.

Kollet, S.J., Maxwell, R.M., Woodward, C.S., Smith, S.G., Vanderborght, J., Vereecken, H., and Simmer, C. Proof-of-concept of regional scale hydrologic simulations at hydrologic resolution utilizing massively parallel computer resources. Water Resources Research, 46, W04201, doi:10.1029/2009WR008730, 2010.

Kollet, S.J. and Maxwell, R.M. Integrated surface-groundwater flow modeling: A free-surface overland flow boundary condition in a parallel groundwater flow model. Advances in Water Resources, 29(7), 945-958, 2006.

Jones, J.E. and Woodward C.S. Newton–Krylov-multigrid solvers for large-scale, highly heterogeneous, variably saturated flow problems, Advances in Water Resources 24:763-774, 2001

Ashby, S.F. and Falgout, R.D. A Parallel Multigrid Preconditioned Conjugate Gradient Algorithm for Groundwater Flow Simulations, Nuclear Science and Engineering, 124(1): 145-159, 1996.




No comments: