Synopsis
CIAO Data Model: syntax for filtering and binning files
Description
The CIAO Data Model (DM) is a versatile interface used by CIAO to examine and manipulate standard format datafiles (e.g. FITS, ASCII). The DM enables powerful filtering and binning of datafiles. This document is an introduction to the DM syntax used by the CIAO tools.
Table of Contents
- 1. DM Syntax and Virtual Files
- 2. Virtual Columns
- 3. Renaming and Reordering Columns
Related help files contain information and examples illustrating the capabilities of the DM. A list of these files can also be obtain from the CIAO command line with "about dm" or "ahelp -k dm".
- ahelp dmascii: using text files in CIAO
- ahelp dmbinning: creating images from event files/tables
- ahelp dmfiltering: table and image filtering
- ahelp dmregions: the CIAO region-filtering syntax
- ahelp dmopt: controlling internal DM options (setting NULL characters, providing more memory to a tool)
- ahelp coords: a discussion of the Chandra coordinate systems
- ahelp subspace : how files keep a history of the filters applied to them
Detailed technical information is available from the Introduction to the Data Model memo
1. DM Syntax and Virtual Files
The Data Model offers an easy and powerful means of filtering data. The filtered file can be directly input to a tool without writing it to disk first; this is known as a "virtual file." The virtual file, which can also be referred to as a subspace, is simply a means of defining a subset of interest in the dataset.
The basic syntax of a virtual file is:
filename[block][filter][binning][option][rename] filename[block][filter][columns][option][rename]
filename: the input filename. All CIAO tools accept FITS file input, and many also accept ASCII files. Some tools only work on event files, while others require an input image. Refer to the individual tool help files for any restrictions.
[block]: the extension of the file to use, e.g. the name of the image or table. For FITS files, the block corresponds to an HDU and may be identified by name ("[EVENTS]") or number ("[2]"). If the block is not specified, the first "interesting" block is used (e.g. [EVENTS] for an event file). To view the blocks in a file, use "dmlist file.fits blocks".
[filter]: the filter to apply to the data. It indicates, for instance, which time period, energy range, or spatial region to use (e.g. "[time=1522012:1522320,1522400:1522600]"). Refer to "ahelp dmfiltering" for a full discussion of filtering.
[binning]: the binning specification for creating an image from an event file (e.g "[bin x=10:100:1,y=1:100:1]"). Refer to "ahelp dmbinning" for a full discussion of binning.
[columns]: the names of the columns to include ("[cols time,energy,]") or exclude ("[cols -phas]"). The syntax "[cols !phas]" may also be used, but the "!" symbol needs to be written as "\!" in the Unix shell, making the "-" syntax more convenient.
[option]: advanced options for the DM, such as specifying what the NULL character should be or how much memory to allow a tool to use. Refer to "ahelp dmopt" for a list of the available options.
[rename]: the name for the block in the output file. The default behavior is for the output to have the same block name, unless a file is binned to create an image; in that case, "_IMAGE" is added to the block name. (For information on renaming columns, refer to a later section in this file.)
2. Virtual Columns
A file may contain virtual columns whose values are calculated by applying a mathematical transform to an existing column. Virtual columns - such as EQPOS(RA,DEC) - do not physically exist in the event file; they are defined by the WCS information attached to another column, e.g. SKY.
The transformation is listed in the output of "dmlist evt2.fits cols":
1: EQPOS(RA ) = (+278.3860) +TAN[(-0.000136667)* (sky(x)-(+4096.50))] (DEC) (-10.5899 ) (+0.000136667) ( (y) (+4096.50))
For most applications, these columns may be used the same as non-virtual columns in the file. It is possible to list, filter, and bin on virtual columns.
However, filtering and binning do not work reliably on virtual columns derived from non-monotonic coordinate transforms (e.g. MSC(THETA,PHI), or EQPOS near the poles; see "ahelp coords" for more information on these coordinate systems).
3. Renaming and Reordering Columns
It is possible to rename a column or change the order of the columns within a file. Note that certain CIAO tools require particular column names (e.g. time, energy), but none of the tools make assumptions about the order of the columns within a file.
To rename a column, run dmcopy with the column syntax "newname=oldname". Multiple columns may be renamed in the same command.
dmcopy "pi.fits[cols rate=count_rate]" pi_rate.fits dmcopy "pi.fits[cols rate=count_rate, rate_err=count_rate_err,*]" \ pi_rate_all.fits
The "count_rate" column in pi.fits is renamed to "rate" in pi_rate.fits. With the first command, "rate" will be the only column in the output file. The "*" operator indicates that all other columns should be copied unchanged to the output file.
The columns will appear in the output file in the order in which they are specified. So in the renaming case, "rate" will be the first column in the output. The "cols" syntax can be used to reorder columns without modifying them as well:
dmcopy "pi.fits[cols energy,time,pi,count_rate, *]" reorder.fits
Note that for a vector column sky(x,y),
"evt.fits[cols x,y]"
will retain the information that (x,y) is a vector column called "sky". Any of the following, however, will separate the vector components and lose the vector-dependent coordinate systems like RA and Dec:
[cols x] [cols y,x] [cols x,pha,y]
Examples
Example 1
acisf01843N002_evt2.fits[EVENTS][cols #1,#2,#3] acisf01843N002_evt2.fits[cols time,ccd_id,node_id]
Select three columns of the EVENTS block by number or by name.
Example 2
acisf01843N002_evt2.fits[#row=1:4]
Select rows 1-4 from a FITS file.
Example 3
dmlist "evt.fits[events][pha=30:200,time=10:20,50:60]" data
Use the tool dmlist to print specific data values from the file to the screen. A filter is applied to the "events" block in the file evt.fits. The filter selects rows in the table for which the value of the pha column is >= 30 and < 20, and for which the time is either >= 10 and < 20 or >= 50 and < 60. Both the the pha and time filters must be satisfied for a row to pass the filter.
Example 4
acisf01843N002_evt2.fits[EVENTS][bin x=3200:4800:4,y=3200:4800:4]
Bin an event file into an image with this input to the tool dmcopy.
Example 5
acisf01843N002_evt2.fits[EVENTS][bin pi=1:1024:1]
Use this specification as input to the tool dmextract to bin an event file into a PI spectrum.
Example 6
dmcopy "evt.fits[cols -status]" evt_new.fits
Removed the status column from evt.fits.
Bugs
General
When creating a file with a Linear WCS, the CXCDM adjusts the transform parameters to force a half bin offset. While mathematically consistent with the values input, the result may be confusing especially when the transform is well known, eg °C to °F.
------------------------------------------------------------------------------- World Coord Transforms for Columns in Table Block simple2.dat -------------------------------------------------------------------------------- ColNo Name 2: tempF = +32.90 [degree F] +1.80 * (tempC -0.50)
which is mathematically correct, but more commonly written as just
tempF = +32 + 1.8 * tempC
There are three issues with the generic use of TC* keywords in FITS files read and written by DM tools.
The first issue: TC*n[A-Z]. The DM function that composes this keyword does not strip off the 'P' from the before adding number and letter to the end.
Second issue: TCTY* keys are not recognized and therefore are not processed.
Third issue: DM is not always retaining the T*NAM information when it stores information on the DM descriptor, resulting in output keywords T*TYP instead.
When a FITS file is copied, the EXTNAME is forced on output to be the same as the HDUNAME. Typical Chandra/CIAO files require these to be the same; however, data from other missions and projects may not have the same requirement.
Various CIAO tools including dmappend, dmhedit, dmreadpar, acis_clear_status_bits, and dmgti attempt to modify a files contents in place. In addition users of crates may attempt to write scripts that try to open a file with read+write access.
Files may not be writeable for several reasons. The normal UNIX file permissions may not allow a particular user to modify a file.
File that are gziped are never writeable, regardless of whether the file itself has file-write permission.
Standard in and standard out, accessed by the "-" (no quotes) file name are not writeable.
Individual blocks of data that have filters on the data are not writeable. So
my_evt.fits[sky=region(ds9.reg)]
is not writeable becuase it is actually filtering the data,
my_evt.fits[gti3]
is writeable since the filter is applied to select which block in the file.
Filtering Data
This bug is triggered when filtering an image with the [exclude ] synatx with a region together with the [opt full] directive to retain the original image size.
unix% dmcopy "img.fits[exclude sky=region(ciao.reg)][opt full]" filt_img.fits
The pixels outside a box that bounds the region are also filtered out (ie set to 0).
Workaround:
In many cases users can easily invert the logic in their region files to avoid needing to use the [exclude ] directive. This can be done any time the region only contains included shapes. For example:
unix% cat ciao.reg circle(...A...) circle(...B...) circle(...C...) ... unix% cat ciao.reg | awk ' BEGIN {print "field()"} {print "-"$0}' > exclude_ciao.reg unix% cat exclude_ciao.reg field() -circle(...A...) -circle(...B...) -circle(...C...) ...
This file can then be used without needing to use the [exclude ] synatx
unix% dmcopy "img.fits[sky=region(exclude_ciao.reg)][opt full]" filt_img.fits
This has the advantage of also generally being much faster.
dm region filtering with unnamed axes may fail for complex regions, eg
dmlist my_img.fits[circle(10,10,10)-box(0,0,30,30)]
The work around is to explicilty use the axis name, eg
dmlist my_img.fits[sky=circle(10,10,10)-box(0,0,30,30)]
When filtering on WCS columns, the range is taken by converting range of the parent columns and using those as the limits of the WCS columns. When the transform is highly non-linear, eg the TAN-P transform used to go from DETX,DETY to THETA,PHI, this can leads to incorrect limits and incorrect filters. Users who want to filter on WCS columns should give explict ranges and not rely on the computed min/maxes.
bad% dmcopy "evt.fits[theta=:1]" good% dmcopy "evt.fits[theta=0:1]"
When region-filtering images, you can create a vector on the fly from any two axes by using a filter like "(#1,#3)=circle(...)". Although the image is filtered correctly with a temporary vector, the region filter isn't recorded in the subspace. Hence, tools that use the filtered file don't know that pixels outside the filter region are invalid. As a result, dmstat reports no nulls in the filtered image (unless you explicitly tell the DM to set pixels outside the filter to null by using "opt null=...").
Trying to use one column that is part of a vector column in a region expression can lead to a crash
% dmlist hrcf04482N003_evt2.fits"[(tg_lam,tg_d)=field()]" counts 4381489 # 24359: Received error signal SIGSEGV-segmentation violation. # 24359: An invalid memory reference was made. # 24359: segmentation fault: DMLIST (1) is: exit_upon_error->NULL
Where tg_d is part of the rd = (tg_r,tg_d) vector column.
The only work-around is to create a temporary file that dismantels the vector column by removing the other column
% dmcopy hrcf04482N003_evt2.fits"[cols -tg_r]" tmp_evt % dmcopy tmp_evt"[(tg_lam,tg_d)=field()]" counts 4381489
Filter on an array column is not supported. The tool will run; however, the results are unpredictable.
ASCII Kernel
SIMPLE ASCII files cannot have 0 rows. If there are 0 rows, then the column header definition row is incorrectly read. For example:
unix% echo "#foo" > bar unix% dmlist bar cols -------------------------------------------------------------------------------- Columns for Table Block bar -------------------------------------------------------------------------------- ColNo Name Unit Type Range 1 foo String[4] unix% dmlist bar data -------------------------------------------------------------------------------- Data for Table Block bar -------------------------------------------------------------------------------- ROW foo 1 #foo unix% dmlist bar counts 1
The SIMPLE ASCII file should have 0 rows and we see from the data option that the column definition row has been read in as data.
In the DM, you can normally do
unix% dmlist evt.fits"[cols ra,dec]" data
even though RA and Dec are just coordinate systems defined on the X and Y columns in the file; the DM applies the transform on the fly. This doesn't work yet for ASCII files.
DTF-FIXED header lines may be up to 1024 characters long. However, if the keyword is longer than the FITS standard, the comment is truncated.
unix% input.txt output.dtf'[opt kernel=text/dtf-fixed]'
In input.txt:
TTYPE14 = 'Class' / LV Class Exo: M = missile [B = tactical ballistic missile (except Redstone) apo=80:200] R = research rocket O = orbital LV V = RTV Y = Exo weather rocket X = Big test rocket D= Deep space launch
In output.dtf:
TTYPE14 = "Class " / LV Class Exo: M = missile [B = tactical ballistic missile (except Redstone) apo
ASCII files may contain keyword and column descriptions longer than 80 characters. This can cause random failures in the DM (unable to open columns, wrong data type, etc).
Binning & Rebinning Images
For example:
unix% dmcopy acis.img"[bin x=::5,y=::6]" acis5x6.img
Using the same value for both axes works correctly:
unix% dmcopy acis.img"[bin (x,y)=::5]" acis5.img
unix% dmextract Input event file (ccd3.sky4.fits[y=3767:][bin sky=annulus(3786,3767,0:380:4)]): Enter output file name (rprof.fits): # dmextract (CIAO 4.0 Beta 2): WARNING: Input file, "ccd3.sky4.fits[y=3767:]", has no rows in it. Bus error
For images without a physical coordinate system the DM will internally create one. This is not written to the output file which may lead to errors if the image size changes due to spatial filtering as the output file then has a different physical WCS compared to the input file.