/ ciao / scripting / runtool.html

How to run CIAO tools from within Python

Aim: Show how CIAO tools can be run from Python.

Summary: We highlight several ways of running CIAO tools from Python, including the routines from the Python subprocess module (which can be used to run any command-line application) and the specialized routines from the ciao_contrib.runtool module which lets you run CIAO tools with a "parameter-based" interface.

Both the subprocess module, provided as part of the standard Python distribution, and the ciao_contrib.runtool module, which is part of the CIAO scripts package, can be used to run CIAO tools from Python as described below. The choice depends on your exact needs, but we suggest trying the ciao_contrib.runtool module first. If you need greater access to the output (such as separating out the stdout and stderr channels), or want to run a non-CIAO tool, then use the subprocess module.

The examples below can be used from both an interactive Python environment, such as provided by Sherpa or IPython, or from a Python "script".

Using the ciao_contrib.runtool module

Check for availability

The module is provided as part of the CIAO script package. If the following command exits with a "No such file or directory" error, then please install the script package.

unix% cat $ASCDS_CONTRIB/VERSION.CIAO_scripts

Loading the module

The module can be loaded using the standard Python import command, such as any of the following:

from ciao_contrib.runtool import *
from ciao_contrib.runtool import acis_process_events, dmstat, dmmerge
import ciao_contrib.runtool
import ciao_contrib.runtool as rt

If the command returns

ImportError: No module named runtool

then please check the version of the CIAO scripts package, as described above.

In the following we assume the module has been loaded using

from ciao_contrib.runtool import *

Running a CIAO tool

The module provides routines with the same name as the CIAO tools which, when called, will run the respective tool. Parameter values can be given when the tool is run (using Python named arguments for optional parameters) or beforehand using a "pset" style interface. Any screen output from the tool is returned as a string, as shown in the example below of calling dmstat:

>>> dmstat("img.fits")

PRIMARY(X, Y)[MJy/sr]
    min:        20.022872925          @:        ( 588 665 )
    max:        24.589748383          @:        ( 299 487 )
cntrd[log] :    ( 344.99795211 345.0132221 )
cntrd[phys]:    ( 344.99795211 345.0132221 )
sigma_cntrd:    ( 1708.7638456 1708.8441011 )
   good:        474721
   null:        0

The dmstat call generates screen output which is returned by the call to the dmstat() routine and automatically displayed by python.

In the following example we capture the screen output in a variable and show how optional parameters - in this case centroid and median - can be set:

>>> a = dmstat("img.fits", centroid=False, median=True)
>>> print(a)
PRIMARY[MJy/sr]
    min:        20.022872925          @:        ( 588 665 )
    max:        24.589748383          @:        ( 299 487 )
   mean:        20.123614861
 median:        20.115715027
  sigma:        0.072704951096
    sum:        9553102.5703
   good:        474721
   null:        0

Parameters can also be set using a pset-style interface before the function is called:

>>> dmstat.punlearn()
>>> dmstat.centroid = False
>>> dmstat.median = True
>>> dmstat.infile = "img.fits"
>>> a = dmstat()

and the two styles can be combined, which allows you to say

>>> dmstat.punlearn()
>>> dmstat.centroid = False
>>> dmstat.median = True
>>> a = dmstat("img.fits")

The current parameter settings for a tool can be found by calling print on the routine:

>>> print(dmstat)
Parameters for dmstat:

Required parameters:
              infile =                  Input file specification

Optional parameters:
            centroid = False            Calculate centroid if image?
              median = True             Calculate median value?
               sigma = True             Calculate the population standard deviation?
                clip = False            Calculate stats using sigma clipping?
              nsigma = 3                Number of sigma to clip
             maxiter = 20               Maximum number of iterations
             verbose = 1                Verbosity level
         out_columns =                  Output Column Label
             out_min =                  Output Minimum Value
         out_min_loc =                  Output Minimum Location Value
             out_max =                  Output Maximum Value
         out_max_loc =                  Output Maxiumum Location Value
            out_mean =                  Output Mean Value
          out_median =                  Output Median Value
           out_sigma =                  Output Sigma Value
             out_sum =                  Output Sum of Values
            out_good =                  Output Number Good Values
            out_null =                  Output Number Null Values
          out_cnvrgd =                  Converged?
       out_cntrd_log =                  Output Centroid Log Value
      out_cntrd_phys =                  Output Centriod Phys Value
     out_sigma_cntrd =                  Output Sigma Centriod Value

The current value for a parameter can be accessed using the toolname.parname syntax, as shown below

>>> print(dmstat.centroid)
False

Parameter types

Parameter values are matched to the appropriate Python type; boolean parameters such as centroid and median are set using True and False, strings and filenames are given as Python strings and Python numbers are used for numeric types (integer and float parameters).

Parameter names

As shown in the example above, required parameters can be used without names, using the order given in the parameter file, and optional parameters are given using named arguments. Required parameters can also be given by name, so the previous dmstat call could have been written

a = dmstat(infile="img.fits", median=True, centroid=False)

The routines also support the use of unique prefixes for the parameter names, in the same way the command-line interface does, so the above could also have been written as

a = dmstat(inf="img.fits", med=True, cen=False)

Note that the prefix can not match a Python reserved word, otherwise Python will get horribly confused. An example of this is trying to use in as the prefix for the infile parameter, which will result in a syntax error that looks something like:

>>> a = dmstat(in="img.fits", cen=False, med=True)
------------------------------------------------------------
   File "<ipython console>", line 1
     a = dmstat(in="img.fits", cen=False, med=True)
                 ^
SyntaxError: invalid syntax

Output parameters

Some tools will set parameter values; these values are available after the tool has run using the toolname.paramname syntax. In the example below we use this to get the median value calculated by dmstat.

>>> dmstat("img.fits", centroid=False, median=True, v=0)
>>> print("Median = {0}".format(dmstat.out_median))
Median = 20.115715027

The output parameter values are converted to Python types (e.g. string, int, float, bool) using the same rules as for the input parameters. In this case the output value is actually a string, since there could be multiple values stored depending on the input to dmstat.

By setting the verbose parameter to 0 we stopped any screen output, which meant that the dmstat() call returned None instead of a string.

Parameter limits

Parameter limits - such as minimum and maximum values or a list of valid answers - are enforced by this module. An error will occur if a limit test is failed, as in the examples below:

>>> dmstat("img.fits", maxiter=0)
ValueError: dmstat.maxiter must be >= 1 but set to 0
>>> dmjoin.verbose = 10
ValueError: dmjoin.verbose must be <= 5 but set to 10
>>> dmjoin.interpolate = "foo"
ValueError: The parameter interpolate was set to foo when it must be one of:
  linear first last closest furthest minimum maximum

Note that, unlike at the command-line, there is no attempt to query the user for a new parameter value.

As with the command-line interface, parameters which are restricted to a set of options - such as the interpolate parameter of dmjoin - can be set using a unique substring, as shown below.

>>> dmjoin.interpolate = "min"
>>> print(dmjoin.interpolate)
minimum

Errors

If the tool fails to run successfully, then an IOError will be created, with the screen output from the tool, such as in the following example where dmstat is given a file name that does not exist.

>>> dmstat("does-not-exist")
IOError: An error occurred while running 'dmstat':
  # DMSTAT (CIAO 4.18): could not open infile does-not-exist.

An AttributeError will be raised if you try to use a parameter that does not exist, as in the following example:

>>> dmstat.nclip = 20
AttributeError: There is no parameter for dmstat that matches 'nclip'

Similarly, using a non-unique prefix for a parameter name will also raise an error, as shown below:

>>> dmstat.c = True
AttributeError: Multiple matches for dmstat parameter 'c', choose from:
  centroid clip

Parameter files

Running a tool using the runtool module, or setting a parameter value using the toolname.parname syntax, does not change the on-disk parameter file for the tool. This means that it is safe to run multiple copies of a tool at once.

Each tool has a write_params() method to write the current settings to disk and a read_params() method to read in values from a parameter file.

>>> !pset dmjoin interp=clo
>>> dmjoin.read_params()
>>> print(dmjoin.interpolate)
closest

The module provides a set_pfiles() routine which will adjust the PFILES environment used by the tools; this is mostly important for scripts that run multiple tools - such as specextract - or for those tools that need to refer to multiple parameter files - for instance mkinstmap uses the ardlib parameter file.

An example

In the following we use the dmhistory tool to store the settings used to create an instrument map into the parameter file of mkinstmap, read these settings into the mkinstmap tool, adjust a few of them, run the tool and print out some simple run-time details:

>>> dmhistory("imap7.fits", "mkinstmap", action="pset")
>>> mkinstmap.read_params()
>>> print(mkinstmap)
Parameters for mkinstmap:

Required parameters:
             outfile = imap7.fits       Output File Name
        spectrumfile =                  Energy Spectrum File (see docs)
          monoenergy = 1.5              Energy for mono-chromatic map [keV]
           pixelgrid = 1:1024:#1024,1:1024:#1024  Pixel grid specification x0:x1:#nx,y0:y1:#ny
             obsfile = asp7.fits[asphist]  Name of fits file + extension with obs info
           detsubsys = ACIS-7           Detector Name
             grating = NONE             Grating for zeroth order ARF
            maskfile = msk1.fits        NONE, or name of ACIS window mask file
             pbkfile = pbk0.fits        NONE, or the name of the parameter block file

Optional parameters:
              mirror = HRMA             Mirror Name
              dafile = CALDB            NONE, CALDB, or name of ACIS dead-area calibration file
       ardlibparfile = ardlib.par       name of ardlib parameter file
             geompar = geom             Parameter file for Pixlib Geometry files
             verbose = 0                Verbosity
             clobber = False            Overwrite existing files?
>>> mkinstmap.monoen = 4
>>> mkinstmap("imap7.e4.fits")
>>> d = mkinstmap.get_runtime_details()
>>> print(d["delta"])
17 seconds
>>> print(d["args"][:3])
[('outfile', 'imap7.e4.fits'), ('spectrumfile', ''), ('monoenergy', '4.0')]

The get_runtime_details() method returns a dictionary which contains information on the last run of the tool. Here we show the contents of the delta and args keywords which give the run time of the tool as a string, and a list of all the arguments used to run the tool (in this case we restrict the output to the first three elements). Other fields are also present; use help <toolname>.get_runtime_details to see the full list.

To finish we run mkexpmap and then dmstat on its output, inspecting the parameter settings of the tool.

>>> mkexpmap.punlearn()
>>> mkexpmap.xy = "3530.5:4973.5:1,3094.5:4537.5:1"
>>> mkexpmap("asp7.fits", "emap7.e4.fits", "imap7.e4.fits")
>>> dmstat("emap7.e4.fits[sky=region(fov1.fits[ccd_id=7])]", centroid=False)

EXPMAP[cm**2]
    min:        0             @:        ( 4052 3096 )
    max:        417.29486084          @:        ( 4095 4056 )
   mean:        349.53833266
  sigma:        109.11054852
    sum:        407873484.07
   good:        1166892
   null:        915357
>>> print(dmstat)
Parameters for dmstat:

Required parameters:
              infile = emap7.e4.fits[sky=region(fov1.fits[ccd_id=7])]  Input file specification

Optional parameters:
            centroid = False            Calculate centroid if image?
              median = False            Calculate median value?
               sigma = True             Calculate the population standard deviation?
                clip = False            Calculate stats using sigma clipping?
              nsigma = 3.0              Number of sigma to clip
             maxiter = 20               Maximum number of iterations
             verbose = 1                Verbosity level
         out_columns = EXPMAP           Output Column Label
             out_min = 0                Output Minimum Value
         out_min_loc = 4052,3096        Output Minimum Location Value
             out_max = 417.29486084     Output Maximum Value
         out_max_loc = 4095,4056        Output Maxiumum Location Value
            out_mean = 349.53833266     Output Mean Value
          out_median =                  Output Median Value
           out_sigma = 109.11054852     Output Sigma Value
             out_sum = 407873484.07     Output Sum of Values
            out_good = 1166892          Output Number Good Values
            out_null = 915357           Output Number Null Values
          out_cnvrgd =                  Converged?
       out_cntrd_log =                  Output Centroid Log Value
      out_cntrd_phys =                  Output Centriod Phys Value
     out_sigma_cntrd =                  Output Sigma Centriod Value
>>> print("Mean = {0} +/- {1}".format(dmstat.out_mean, dmstat.out_sigma))
Mean = 349.53833266 +/- 109.11054852

Running multiple copies of a tool

The Python multiprocessing module makes it possible to run tools in parallel, and is used to provide the multiprocessing feature in scripts like fluximage and merge_obs. Alternatively, you can run several copies of your script at the same time. In either case, you may see errors or problems because the same parameter file is being used for each run of the tool (this can lead to invalid results, as inconsistent parameter settings are used, or the tool can fail). To avoid this, use the new_pfiles_environment context manager to create a new copy of your PFILES environment. An example of its use is shown below (this uses Sherpa's parallel_map routine, which provides a simple way to use the multiprocessing module):

import ciao_contrib.runtool as rt
import sherpa.utils

# The asphist/mkinstmap/mkexpmap call below will use their own
# PFILES directory and so will not collide with other runs of
# these tools (or changes to the default ardlib parameter file
# since this has been copied over to the new directory)
#
def getemap((ccd, gridstr)):
    """Create an ACIS exposure map for the ccd and grid that can
    be run in parallel. The routine can only accept a single
    argument as it is to be called via Sherpa's parallel_map
    routine. As we need multiple values, we therefore use a tuple.

    This routine is only for didactic purposes since it assumes
    a number of things, such as file names, that will not be
    true in general.
    """

    # We copy over the ardlib so that if the bad-pixel file
    # has been set, its value is used in the new environment.
    #
    with rt.new_pfiles_environment(ardlib=True):
        afile = "asphist{0}.fits".format(ccd)
        ah = rt.make_tool("asphist")
        ah(infile="asol1.fits", outfile=afile,
           evtfile="evt2.fits[ccd_id={0}]".format(ccd),
           clobber=True)

        ifile = "imap{0}.fits".format(ccd)
        mki = rt.make_tool("mkinstmap")
        mki(outfile=ifile, monoenergy=1.7, pixelgrid="1:1024:1,1:1024:1",
            obsfile=afile+"[asphist]", detsubsys="ACIS-{0}".format(ccd),
            maskfile="msk1.fits", pbkfile="pbk0.fits", clobber=True)

        efile = "emap{0}.fits".format(ccd)
        mke = rt.make_tool("mkexpmap")
        mke(asphistfile=afile, outfile=efile, instmapfile=ifile,
            xygrid=gridstr, clobber=True)

# Create the arguments for CCDs 0-3 and 6
grid = "2824.5:6552.5:#932,2868.5:5328.5:#615"
ccds = [0, 1, 2, 3, 6]
args = [(ccd,grid) for ccd in ccds]

# Run getemap on all available CPUs
sherpa.utils.parallel_map(getemap, args)

Note that new_pfiles_environment automatically creates the temporary parameter directory, changes the PFILES environment variable, runs your code, and then restores the PFILES variable and deletes the temporary directory for you.

Using the subprocess module

For simple cases, the call() and check_call() routines can be used, but for more complicated cases Popen() should be used.

The following examples assume that the subprocess module has been loaded using

from subprocess import *

Using call() or check_call()

These routines take a list of arguments, with the first one being the tool name and the remaining the command-line arguments, and run the tool. Any screen output will be displayed (not returned as a string), and the routines either return the exit status (call()) or raise an error if a non-zero exit status is returned (check_call).

In the following we show dmstat can be run using these routines:

>>> call(["dmstat", "img.fits", "cen-", "med+"])
PRIMARY[MJy/sr]
    min:        20.022872925          @:        ( 588 665 )
    max:        24.589748383          @:        ( 299 487 )
   mean:        20.123614861
 median:        20.115715027
  sigma:        0.072704951096
    sum:        9553102.5703
   good:        474721
   null:        0
          0
>>> rval = call(["dmstat", "img.fits", "cen+"])
PRIMARY(X, Y)[MJy/sr]
    min:        20.022872925          @:        ( 588 665 )
    max:        24.589748383          @:        ( 299 487 )
cntrd[log] :    ( 344.99795211 345.0132221 )
cntrd[phys]:    ( 344.99795211 345.0132221 )
sigma_cntrd:    ( 1708.7638456 1708.8441011 )
   good:        474721
   null:        0

As the tool ran successfully the call returned a 0 as shown on the last line of output from the first call. In the second we store the return value in the variable rval.

The check_call() works as above, but if the return value is not 0 then it raises an error:

>>> rval = check_call(["dmstat", "infile=img.fits", "verbose=0"])
>>> print(rval)
0
>>> rval = check_call(["dmstat", "infile=src.fits", "verbose=0"])
# DMSTAT (CIAO 4.18): could not open infile src.fits.

CalledProcessError: Command '['dmstat', 'infile=src.fits', 'verbose=0']' returned non-zero exit status 1

Using Popen()

Using Popen() provides a great deal of flexibility in how you call a command-line tool: the contents of the stdout or stderr channels can be accessed rather than displaying on screen, long-running processes can be handled, and more. The module documentation for the class provides all the details; below we highlight some common use cases.

Running a tool

>>> p = Popen(["dmstat", "img.fits", "centroid=no"])
>>> PRIMARY[MJy/sr]
    min:        20.022872925          @:        ( 588 665 )
    max:        24.589748383          @:        ( 299 487 )
   mean:        20.123614861
  sigma:        0.072704951096
    sum:        9553102.5703
   good:        474721
   null:        0

>>> p.wait()
0

Here we start the dmstat tool and the interactive prompt appears. Then the tool finishes and creates screen output, which appears as if it is a command (the exact location of the output depends on how fast the tool is run in the background), and then we use the wait() method to ensure that the tool has finished and get its exit status.

Accessing the screen output

Setting the stdout and stderr arguments to PIPE makes it possible to access these channels, as shown below.

>>> p = Popen(["dmstat", "img.fits", "centroid=no"], stdout=PIPE, stderr=PIPE)
>>> (sout, serr) = p.communicate()
>>> print(sout)
PRIMARY[MJy/sr]
    min:        20.022872925          @:        ( 588 665 )
    max:        24.589748383          @:        ( 299 487 )
   mean:        20.123614861
  sigma:        0.072704951096
    sum:        9553102.5703
   good:        474721
   null:        0

>>> print(serr == '')
True
>>> print(p.returncode)
0

The communicate() returns the data from the stdout and stderr channels and waits for the program to end. In this case there was no output to stderr so the empty string was returned.

If the stderr argument was set to STDOUT instead of PIPE, the first value returned bycommunicate() would contain all the screen output from the tool, and the second argument would be None.

/ ciao / scripting / runtool.html