Monday, September 24, 2007

Just what is the role of TCL in ParFlow?

A common question arises around the role of TCL scrip in ParFlow. These questions often center around how ParFlow is executed and whether the executable is TCL/TK or something else. ParFlow is written in primarily in C, and the executable sits in the /exe/bin directory. How it’s actually run is a bit more complicated. ParFlow uses TCL script (with modified “pftcl” commands) to act as a front-end for the executable. The primary role of TCL is to build the database which ParFlow reads as input. This database is a text file with each database key and that key’s value and has the file extension .pfidb. These keys are all set using the pftcl command pfset and are written to the database (with some minor error checking) when the pfrun command is executed from the TCL script. This is all done by the parflow.tcl script which sits in /exe/bin. At this point, the script does some operations to set up parallel parameters (number of processors, etc) then executes another script called run. This script runs the ParFlow executable, using a different command line syntax depending upon a number of things, what type of architecture and whether ParFlow is being run in parallel or serial.
This can be thought of as follows:


This diagram illustrates how even though the TCL scripting language is used to set up and run ParFlow, it’s really not involved in running ParFlow at a basic level. This is usually transparent to the user, but occasionally the run or even the parflow.tcl files need to be modified to fit the syntax of running on a particular platform. The run file can be pretty easily modified to fit any specific system. In this script there are if-then constructs to identify what platform a user is running on, then machine-specific routines for that platform. For example, on a linux cluster using mpirun, ParFlow is executed using mpirun:

mpirun -nodes $3 -np $2 $PARFLOW_DIR/bin/$PROGRAM $1

Where $PARFLOW_DIR is the environment variable set for the location of ParFlow and $PROGRAM is the executable, parflow. Argument one ($1) is the run name and arguments two and three are the numbers of nodes and processors used in the simulation. Of course on a system that did not use mpirun (or used a variation with different command arguments) this file could be edited to match that system’s specifics. One caution, this file is overwritten upon rebuilding ParFlow, so make sure you save a copy. You will need to copy this new copy of run back over again or you can replace the original version in /src/amps/mpi1 (if you are using mpi- use caution unless you know what you are doing here!).

This article should provide some understanding of how TCL is used in ParFlow and how ParFlow is run. It also provides some tips on how to get ParFlow to run on a particular parallel system.

Thursday, September 13, 2007

Running ParFlow in Parallel

Often when running ParFlow on a cluster it is common to wonder about parallel speedup. One of the really nice things about ParFlow is how easy it is to run in parallel. First of all, ParFlow can be run in parallel simply by changing the process topology keys, P, Q and R. The computational domain is split up according to those values, nx/P, ny/Q and nz/R for a total of P*Q*R processors.

There are a number of reasons to run in parallel, faster execution times or to run a problem that is too large to fit in the memory of a single machine. It is also important to know what an optimal number of processors is for your particular problem. This can be very helpful in setting up a run and there are a few metrics that can help determine this.

A common approach (e.g. Kollet and Maxwell, 2006; Jones and Woodard, 2001) is to use scaled efficiency, E, is defined as E(n,p) = T(n, 1)/T(pn,p), where T is the run time (i.e. “wall clock” time) as a function of the problem size, n, and the number of processors is p. The important factor here is that the problem grows linearly with the number of processors, so the number of compute cells for each processor always stays the same. If everything is perfect, parallel code and parallel hardware, E(n,p) = 1: doubling the problem size and the number of processors will result in the same run time.

The world is not perfect, of course, and many factors influence scaling efficiency. The biggest factors are things that don’t speed up in parallel (e.g. problem set up and inter-processor communication) compared to things that do speed up in parallel. The biggest factor is usually the amount of time the code and architecture spend communicating between processors versus the amount of time a processor spends solving a portion of the problem. This boils down to how big a problem you are solving and how much of that problem each processor has. In other words, if a problem contains a large number of cells and each processor has a large number of cells, scaling will be much more efficient than if a processor has a fewer number of cells because each cpu will spend more time computing and less time communicating. This is illustrated by the graph below:


In this graph scaled parallel efficiency plotted as a function of processor number for small and large problems sizes (from Kollet and Maxwell, 2006, used w/ permission). We see that the larger problem reaches a stable scaled efficiency after just a few processors and maintains this efficiency out to 100 processors. While the small problem size shows much poorer efficiency and does not reach a stable value. This is due to the ratio of work the system does communicating compared to the work the system does solving the problem.

It is very easy to run a scalability study with ParFlow. Below I’ll modify the harvey_flow.tcl script and use tcl variables to automatically increase the problem size and the number of processors simultaneously.

At the beginning the of the script, we can add a scaling variable using tcl, I’m just scaling the number of cells, the number of processors and the domain size in y but this could be done in more than one dimension easily. We do this by adding the tcl variable sy and setting the processor topology using it:
set sy 1
# Process Topology
pfset Process.Topology.P 1
pfset Process.Topology.Q $sy
pfset Process.Topology.R 1


Then we set the computational grid as a multiplier of sy:

pfset ComputationalGrid.NX 50
pfset ComputationalGrid.NY [ expr 100*$sy ]
pfset ComputationalGrid.NZ 100


Then the domain, upper and lower aquifer geometries:

pfset Geom.domain.Upper.Y [ expr 34.2*$sy ]
...
pfset Geom.upper_aquifer.Upper.Y [ expr 34.2*$sy ]
...
pfset Geom.lower_aquifer.Upper.Y [ expr 34.2*$sy ]
...


We now have a problem that increases the problem size and the number of processors by changing a single variable. Lastly, we can delete the looping/Monte Carlo portion of the original script and replace it with a run command that uses sy so that our output, namely the log file which contains the run time information, will contain the processor/problem size information:

pfrun pc_scale.$sy
pfundist pc_scale.$sy


I ran this same script on my laptop (a dual-core cpu) and on my linux cluster to look at scaling efficiency. The run times are contained in the log file, named pc_scale.out.log if you use my pfrun command syntax above. The results are below:


We can see that for this problem the scaling on the cluster is pretty good out to 8p and that it is quite a bit better than using one or two cores of my laptop. The scaled efficiency comparable to, but not as good as the plot I showed above from Kollet and Maxwell however, meaning the communication overhead might be still large and we might benefit from running an even larger problem (500,000 cells/cpu is not that many, really). This curve does tell us that we can expect to see reasonable decreases in simulation time out to eight processors of the cluster.

We can run another test that is useful, particularly if we have a problem of a given size in mind and want to determine how many CPU’s to run it on. I used the same script, modified as above, but I increase the number of processors but keep the problem size constant at 500,000. You can do this by keeping sy at 1 and changing the Q processor topology manually, for example. When I do this on my laptop or cluster, I plot scaled execution time (time to run on n cpus / time to run on 1 cpu) as a function of number of processors. This is plotted below:

In this plot we clearly see the point where splitting the problem up on additional processors yields no decrease in execution time. In fact, moving from 4 to 8p slightly increases execution time. We can also see differences in the decrease in execution time from using 1-2 cores on the laptop as opposed to more processors on the cluster. These results depend entirely on problem size (recall, that 500,000 cells is a pretty small problem to begin with), computer architecture and load balancing. Running a larger problem would change these numbers (as demonstrated by the scaled efficiency graph above) running on different machines changes these numbers (e.g. laptop v. cluster) and how the problem is split up (in x, y or z) will also change these numbers. How the problem is split up, or allocated to different processors (i.e. load balancing) needs to account for various aspects of the problem being solved. This depends quite a lot on the physics of the problem. For the example I’ve shown here we are solving fully-saturated flow using the linear solver and there are not big differences in problem physics over the domain. This changes if we solve using the Richards’ equation nonlinear solver, where parts of the problem are linear (below the water table) and parts of the problem are non-linear (above the water table). This point is made very clearly in Jones and Woodward (2001), where scaling efficiencies splitting a problem in x and y are shown to be much better than those from splitting the problem in z. It is easy to imagine situations using the overland flow boundary condition that would be even more sensitive to this, for example if the problem is split in x and y but with a river on some processors (yielding extra per-processor work) but not others. The best approach is to run your problem or something very much like it for a few trial runs to determine an optimal number of processors and topology. I hope this provides some background information to get things started.

Wednesday, September 12, 2007

ParFlow-- core references

There are three journal articles that detail the core solver technology in ParFlow. Each provides information about a particular aspect, the linear solver, the nonlinear solver and the overland flow component. I've provided DOI links to each reference as well for easy access.

The reference for the linear solver is: Ashby, S.F. and R.D. Falgout (1996). A parallel multigrid preconditioned conjugate gradient algorithm for groundwater flow simulations. Nuclear Science and Engineering, 124(1):145-59. (NSE only available online from 1997 on, a report version of this paper may be downloaded here).

The reference for the nonlinear solver is: Jones J.E. and C.S. Woodward (2001), Newton-krylov-multigrid solvers for large-scale, highly heterogeneous, variably saturated flow problems. Advances in Water Resources, 24:763-774.

The reference for the overland flow boundary condition is: Kollet S.J. and RM Maxwell (2006) Integrated surface-groundwater flow modeling a free-surface overland flow boundary condition in a parallel groundwater flow model. Advances in Water Resources, 29(7): 945-958.

Another paper that is a very nice example an application of ParFlow is: Tompson, A. F. B., Falgout, R. D., Smith, S. G., Bosl, W. J., and Ashby, S. F. (1998), Analysis of subsurface contaminant migration and remediation using high performance computing: Advances in Water Resources, 22(3):203-221.
These references should help provide a theoretical (and applied) basis for how ParFlow works and how it can be applied to subsurface flow problems.

Tuesday, September 11, 2007

Using Indicator Files as input to ParFlow

A common question is how to get spatially-varying input, such as soil types, into ParFlow. Though there are a number of ways to do this, a most common approach is to use an indicator pfb file with integer values for each land coverage. This is what we've done for the coupled model where we have different land surface types (such as from IGBP) distributed over the ground surface. An example of this type of coverage is in this paper.

What we do is write the indicator value to the corresponding spatial grid location in a pfb file, then load this file in. An indicator field is a .pfb file with integer numbers for every cell in the computational domain. An example in two-dimensions might look as follows.


These are mapped to geometry names within the parflow .tcl input file and the Tcl/TK input script would look something like:

pfset GeomInput.indinput.InputType IndicatorField
pfset GeomInput.indinput.GeomNames "field channel subsurf"
pfset Geom.indinput.FileName "my_indicator_input.pfb"

pfset GeomInput.field.Value 1
pfset GeomInput.channel.Value 2
pfset GeomInput.subsurf.Value 3
You then can use the names to assign properties, say permeability

pfset Geom.Perm.Names "field channel subsurf"

pfset Geom.field.Perm.Type Constant
pfset Geom.field.Perm.Value 0.2

pfset Geom.channel.Perm.Type Constant
pfset Geom.channel.Perm.Value 0.001

pfset Geom.subsurf.Perm.Type Constant
pfset Geom.subsurf.Perm.Value 1.0
This will assign these parameter values anywhere these indicators are located in the file. You need to distribute the file before reading it in. To write the file you can modify the FORTRAN code pf_write.f90 located in the /helper_codes directory or use another program to write an ascii file and then use pftools to convert it to a .pfb file.

Using Tcl/TK to load files and generate input parameters

A question that comes up frequently in ParFlow is how to load in external parameters from a file. This could be for transient forcing for boundary conditions, for Monte Carlo simulation (in addition to changing a random seed) or even in calibration using parameters generated from another program, such as PEST. The example below was from a question recently asked by a ParFlow user and demonstrates how this can be done for a Monte Carlo simulation, though this same approach is adaptable to many other situations.

First, one would load in an array of values from a file using tcl. This is not in the ParFlow manual or addendum, but you can easily do this with native Tcl/TK commands. We use the example from the harvey_flow.tcl. Right now that script does something like this:

set fileId [open stats4.txt r 0600]
set kgu [gets $fileId]
set varu [gets $fileId]
set kgl [gets $fileId]
set varl [gets $fileId]
close $fileId

pfset Geom.upper_aquifer.Perm.GeomMean $kgu
pfset Geom.upper_aquifer.Perm.Sigma $varu
This allows us to provide parameters externally, such as from PEST for calibration (as was done in this example). To modify things, one could do something like this, say to load in file of n Kg's:

set fileId [open stats4.txt r 0600]
for {set i 1} {$i <= $n} {incr i 1} {
set kg($i) [gets $fileId]
}
close $fileId
Farther down in the harvey_flow.tcl script, where we loop over the realizations and change the random seed, we can also loop through the file of values we've read into the arrays for Kg:

#
# Loop through runs
#
for {set k 1} {$k <= $n_runs} {incr k 1} {
#
pfset Geom.upper_aquifer.Perm.Seed [ expr 33333+2*$k ]
pfset Geom.upper_aquifer.Perm.GeomMean $kg($k)
...

This approach should work for a number of circumstances as long as you can generate a text file of values outside of the ParFlow simulation script (or one could even generate some relationships within the Tcl script). This can easily be done in FORTRAN, in a spreadsheet or other mathematical program. This approach can easily be modified to load data for wells, boundary conditions or any transient condition as well.

Monday, September 10, 2007

Welcome to the ParFlow blog

Welcome to the ParFlow bog. Here we will post common questions (and answers), tutorials and general information regarding ParFlow.