Synopsis
Filtering tables and images with the Data Model
Description
The CIAO Data Model provides a flexible filtering syntax which may be applied to table (event) and image files. This document describes many varieties of filtering, organized by the input file and data type or coordinates.
A few examples of region filtering are included; refer to the dmregions help file for detailed information and further examples. Users may also be interested in filtering using 2D pixel masks as described in the dmmasks help file.
For a general introduction to Data Model syntax, see "ahelp dm".
Table of Contents
- 1. Table filtering on columns with real data types
- 2. Table filtering on columns with integer data types
- 3. Table filtering on columns with character string data types
- 4. Table filtering on columns with bit data type
- 5. Table filtering on columns with logical data type
- 6. Table filtering on vector columns
- 7. Compound filters
- 8. Exclude filters
- 9. Filters defined in files: using the @ syntax
- 10. Image filtering on logical coordinates
- 11. Image filtering on physical coordinates
- 12. Image filtering on world coordinates
- 13. Difference between image filtering and table filtering
1. Table filtering on columns with real data type
The simplest kind of filter is a range filter
[filter energy=1000:2000]
which specifies that the DM should only see rows in the input file which satisfy 1000.0 <= energy < 2000.0. You can use this filter by appending it to a filename or block name in any of the CIAO tools:
unix% dmcopy "a.fits[filter energy=1000:2000]" b.fits unix% dmcopy "a.fits[events][filter energy=1000:2000]" b.fits
Note that the prefix "filter" is optional:
unix% dmcopy "a.fits[energy=1000:2000]" b.fits
You can filter on multiple quantities:
unix% dmcopy "a.fits[energy=1000:2000,time=5410300:5410320]" b.fits
You can also filter on multiple ranges for each quantity:
unix% dmcopy "a.fits[energy=1000:2000,4000:8000,grade=0,2:4,6]" b.fits
This filter accepts rows which have energies between either 1000 and 2000 or 4000 and 8000, and grades equal to 0, 2 to 4, or 6. Note that you can leave out the colon if the min and max of a range are the same, so 0:0 becomes just 0. If you want to express "less than" and "greater than", you can just retain the colon but omit the min or max, so
[energy=:4000]
means accept energies up to 4000; whilst
[detx=:511,513:]
means accept all values of detx except 511.0 <= detx < 513.0.
NaN/NULL values
Columns many contain IEEE special values NaN, Inf, and -Inf. They may also contain integer, string, or logical NULL values (special values used to identify missing or otherwise invalid data). These values will not be included when any filter is used, so
[energy=4000:]
would omit any rows where the energy value is Inf. You can filter out all IEEE special values by using
[energy=:]
an open range which includes all finite values. The IEEE special values can be included using
[energy=Null,energy=4000:]
where the 'Null' token is used to identify all NaN or NULL values. The Null token can be used in combination with other range or list filters as shown.
2. Table filtering on columns with integer data types
The interpretation is a little different depending on whether the table column has integer or real data type. For an integer data type column,
[energy=4000:5000]
means (4000 <= energy <= 5000), in other words both ends of the range are included.
The syntax also supports the special pseudo-column #row, the row number of the unfiltered file:
[#row=100:200]
3. Table filtering on columns with character string data types
Filtering is a little more restricted with character string columns. You can only use the colon, not > or <:
[filter shape=m:n,s:z]
The range m:n includes everything beginning with m, and it includes the letter n, but not other strings beginning with n: for example, "ngc" is not within m:n since in an ASCII ordering ngc > n. The comparison is case-sensitive.
4. Table filtering on columns with bit data type
Columns with bit data type are a special case. The most common example is the STATUS column in the event files, which is 32 bits wide for Chandra ACIS event files generated between launch and at least 2003 (it may be increased at a later date). Suppose for simplicity you instead have a status column with only 6 bits, and wish to accept rows with the 3rd bit from the end set and the end bit equal to zero. A simple numeric filter of the type described above would have to be very complicated to describe all the numeric values for which these bits have the desired values; instead, the user supplies a `bitmask string':
[filter status=xxx1x0]
The string may only contain the characters 1 (corresponding bit must be set), 0 (bit must not be set) and x (wild card: bit may have any value).
The bit pattern display convention here is that the rightmost bit displayed (with the value 0 in the example above) is the least significant bit, which is bit 32 in the usual FITS convention and bit 0 in the usual C programmers convention.
5. Table filtering on columns with logical data type
Logical data columns contain a value of either true (1) or false (0). When filtering on a "true" value, any of the following will work:
unix% dmlist "srclist.fits[double=1]" data unix% dmlist "srclist.fits[double=T]" data unix% dmlist "srclist.fits[double=TRUE]" data unix% dmlist "srclist.fits[double=true]" data
Likewise, for "false" values:
unix% dmlist "srclist.fits[double=0]" data unix% dmlist "srclist.fits[double=F]" data unix% dmlist "srclist.fits[double=FALSE]" data unix% dmlist "srclist.fits[double=false]" data
6. Table filtering on vector columns
A vector column is a column which consists of two components, such a the DET(DETX,DETY) vector column. There are two ways to filter on a vector column:
a. define a rectangular region by filtering on each of the components separately:
unix% dmcopy "evt.fits[detx=4000:5000,dety=3000:4000]" rectangle.fits
b. use a region filter, either with the vector name ("DET") or the two components in parentheses ("(DETX,DETY)"):
unix% dmcopy "evt.fits[det=circle(4500,3500,120)]" circle.fits unix% dmcopy "evt.fits[(detx,dety)=circle(4500,3500,120)]" circle.fits
"ahelp dmregions" has detailed information on region filtering.
7. Compound Filters
The DM syntax also supports compound (logical OR) filters:
unix% dmcopy \ "evt.fits[(ccd_id=2,chipx=512:513)||(ccd_id=7,chipx=500:520)]" \ two_bits.fits
Note that only lists of filters separated by "||" are supported; arbitrarily-complex C-style logical expressions are not allowed.
8. Exclude filters
The DM has the ability to invert a filter in order to exclude rows or pixels instead of including them:
unix% dmcopy "evt.fits[exclude sky=region(reg.ds9)]" holes.fits unix% dmcopy "evt.fits[exclude pha=2:100,grade=7]" clean.fits
This is particularly useful in conjunction with compound filters (see the previous section):
unix% dmcopy \ "evt.fits[exclude (ccd_id=0:6,8:9,chipx=513)||(ccd_id=7,chipx=512:513)]" \ better.fits
Note that the Data Model cannot combine an exclude filter with any kind of include filter. This combination will produce an error, e.g. "[exclude sky=region(mask.fits)][energy=300:5000]".
9. Filters defined in files: using the @ syntax
Filter specifications may be applied from a file rather than including them on the command line. This easily allows the same filters to be applied to multiple input files. The filter filename is preceded by an "at" symbol (@).
unix% dmlist "acis_evt2.fits[@filters.lis]" counts
where filters.lis contains:
sky=rotbox(4148.125,4043.625,7.58978,22.338761,44.516094), pi > 100
Each complete filter must be separated with a comma, just as they would be on the command line.
The "@" syntax can also be used to apply the GTIs (Good Time Intervals) from one file to another. To filter evt2.fits using the GTIs stored in gti.fits:
unix% dmcopy "evt2.fits[@gti.fits]" evt2_filtered.fits
10. Image filtering on logical coordinates
This filter on logical coordinates selects a 256 x 100 subset of the image:
unix% dmcopy "im.fits[#1=257:512,#2=1:100]" subset.fits
A region filter such as
unix% dmcopy "img[(#1,#2)=circle(145,356,25)]" circle.fits
sets everything outside the specified circle to zero (unless the BLANK keyword is defined, in which case that value is used). An image pixel is considered inside the region if the center of the pixel is inside the region; the edge of the region is considered as being inside the region.
The default behavior of a filter like this is to also shrink the resulting image to be as small as possible surrounding the circle. To suppress this behavior, and create a big, almost empty image the same size as the input file - with a small circle of data in the middle - use the "opt full" modifier:
unix% dmcopy "im.fits[(#1,#2)=circle(145,356,25)][opt full]" circle.fits
To explicitly set the value to be used for pixels lying outside the filter, use the 'opt null' directive:
unix% dmcopy "im.fits[(#1,#2)=circle(145,356,25)][opt null=-100]" circle.fits
"ahelp dmregions" has detailed information on region filtering. For more information on DM options, see "ahelp dmopt".
CIAO also supports the PROS image subsection syntax for filtering images:
unix% dmcopy "im.fits[1:100,1:200]" pros.fits
11. Image filtering on physical coordinates
To filter on the original physical coordinates, use the physical axis names:
unix% dmcopy "im.fits[x=4096.5:4213.5,y=4142.3:6120.1]" subset.fits unix% dmcopy "im.fits[(x,y)=circle(4096,4096,12)]" circle.img
This will give you a square image bounding the given circle, and set to zero all pixels outside the circle. If you wish to mark these pixels as "invalid", you can use the "opt null" syntax described in "ahelp dmopt".
An image pixel is considered inside the region if the center of the pixel is inside the region; the edge of the region is considered as being inside the region.
"ahelp dmregions" has detailed information on region filtering.
12. Image filtering on world coordinates
World coordinates may also be used in a region filter:
unix% dmcopy 'im.fits[(x,y)=circle(11:03:28.4,-20:11:23.2,20.1\")]' circ.img
Note: The arcsecond symbol will work on the command line, but you may cause problems with the parameter tools, such as pset.
It is not possible to use world coordinates when filtering on a one dimensional range - i.e. [x=11:03:28.2:11:03:32.1]. In this cause, it's necessary to use "(x,y)=rectangle(...)" instead.
13. Difference between image filtering and table filtering
There is a subtle difference when using a region to filter a vector column in a table (see #6) versus an image (#11). When filtering a column in a table, the real-valued, floating-point column values are tested to see if they are inside the region. When filtering an image, the center of each binned pixel is checked. This can lead to large differences, especially when the regions are very small compared to the bin size.
unix% dmcopy 'event.fits[bin x=90:110:1,y=90:110:1]' image.fits unix% dmlist 'event.fits[(x,y)=circle(100.6,100.6,0.1)]' data unix% dmlist 'image.fits[(x,y)=circle(100.6,100.6,0.1)]' data
The two dmlist outputs will show different number of counts. The first one, filtering the table, will show one row for each event that is located with 0.1 from (106,100.6). The 2nd dmlist will always show a [1,1] image with 0 counts, because the region used is much smaller than a pixel and doesn't include the pixel center.
Both values are correct. They are just asking slightly different questions.
One may be inclined to always use the value output from the event list since it does not suffer from image quantization effects. However, one needs to be careful to use consistent values in their analysis. For example, if you extract counts from event file, but fraction of the PSF from an image:
unix% dmlist "evt.fits[(x,y)=circle(100,100,1)]" counts unix% dmstat "psf.fits[(x,y)=circle(100,100,1)]" sig- cen- unix% pget dmstat out_sum
In this example, the fraction of the PSF is taken over the whole pixel whereas the events are counted only over a fraction of the pixel; thus the psf fraction has been overestimated.
When the regions are large compared to the pixel this quantization effect has less affect. The difference is often minimal with large number of counts and large (compared to the bin size) regions; and is often just presumed to be factored into the statistical uncertainty.
Bugs
See the bugs page for the Data Model library on the CIAO website for an up-to-date listing of known bugs.
Refer to the CIAO bug pages for an up-to-date listing of known issues.
See Also
- calibration
- caldb
- chandra
- coords, level, pileup, times
- concept
- autoname, ciao, ciao-install, history, parameter, stack, subspace
- dm
- dm, dmascii, dmbinning, dmmasks, dmopt, dmregions
- paramio
- paramio
- tools::coordinates
- dmcoords
- tools::core
- dmappend, dmcopy, dmextract, dmlist
- tools::image
- centroid_map, dmfilth, dmimg2jpg, dmimgadapt, dmimgblob, dmimgcalc, dmimgdist, dmimgfilt, dmimghist, dmimgpick, dmimgpm, dmimgproject, dmimgreproject, dmimgthresh, dmmaskbin, dmmaskfill, dmnautilus, dmradar, dmregrid, dmregrid2, energy_hue_map, evalpos, hexgrid, map2reg, merge_too_small, mkregmap, pathfinder, vtbin
- tools::region
- dmcontour, dmellipse, dmimghull, dmimglasso
- tools::response
- mean_energy_map, pileup_map
- tools::statistics
- dmstat, imgmoment, statmap
- tools::table
- dmgroup, dmjoin, dmmerge, dmpaste, dmsort, dmtabfilt, dmtcalc, dmtype2split
- tools::timing
- dmgti