Where Do the Data Go? An Analysis of Chandra Data
Dissemination
Previous Contents Next
Where Do the Data Go?
An Analysis of Chandra Data
Dissemination
Reprinted by permission from ADASS XV Proceedings (Escorial, Spain, 2005) ASP
Conf. Ser., G. Gabriel, C. Arviset, D. Ponz, and E. Solano,
eds.
Introduction
The Chandra Data Archive (CDA) has been
distributing data to users since September 1999. Over the
years, four interfaces to the archive have been developed: two
are web-based; one is stand-alone; and one is ftp-based. The
public data in the CDA are mirrored at sites around the world.
Downloads are logged, and since 2002 have been maintained in a
database (Blecksmith et al. 2003). In addition to tracking
the number of files and the volume of data being downloaded,
this database is used to perform meta-analyses. Questions
addressed in this paper include: (1) At what rate are the
applications used relative to each other? (2) Where do users
come from? (3) Is there a link between where a user is from
and which application is used? (4) What sorts of events
trigger downloads of a particular
observation?
Relative Application
Use
Users
can connect to the CDA and browse or retrieve data using a
number of interfaces. At this time no single interface
provides
access to all classes of data in the archive. As
a result, all analyses include only downloads that can be done
through WebChaser which has the smallest set of retrievable
products. As a result, in
this study we count only
downloads of public data, excluding engineering observations
and supporting products.
Provisional Retrieval Interface
(PRI):
The original
interface to the CDA providing access to public data, it is a CGI script with
minimal search capabilities. It is the only GUI-based
interface providing access to engineering
observations.
Chaser:
This is a stand-alone Java
application providing access to proprietary as well as public
data. Users can enter individual selection criteria, as well
upload criteria from a file. Search results may be viewed
in Chaser or saved to a file. Users may select individual
files which may be downloaded directly or staged to an ftp
area. Chaser is the only interface providing access to
supporting products.
WebChaser:
This is a web-based application
providing access to proprietary and public data. Search criteria
must be entered individually, but WebChaser provides the
most detailed search results, including quick-look images,
V&V reports, and links to processing status and
publications tracked by the CDA operations group. Primary
and secondary data packages are downloaded to an ftp staging
area.
FTP:
This is an anonymous ftp-site
holding primary and secondary products for all public
observations.
An analysis of downloads shows
that the choice of application depends upon the number of
observations downloaded. Most sessions result in fewer than
10 downloaded datasets. When
FIGURE 28: Relative download application use
for downloads of 10 or fewer and of 500 or more datasets in
a three month period by a single host.
Chaser and WebChaser came on-line in mid-2001, the usage of the PRI
began a steady decline, however a not insignificant number
of users continue to use it today. Chaser’s predominant
use is to download supporting products. For large downloads
(more than 500 observations per download) users prefer the
FTP site. For smaller downloads, most users prefer
WebChaser and users have a slight preference for the FTP
site over the PRI.
User
Demographics
While the vast majority of CDA users are from the United
States, we see a steadily increasing number of countries
accessing the archive each year. We’ve gone from serving
22 countries in 2000 to 40 countries in 2004 and have served
57 countries through the course of the mission. Several
countries download significant
FIGURE 29: Left: Percentage of Chandra downloads by country for PRI, FTP,
and WebChaser.
Right: Relative
downloads by country for WebChaser in 2004.
volumes of data; some of these
either host or are scheduled to host mirror-sites of the
public data. The data show China in particular to be a
major user of the CDA, making it a good candidate for a
future mirror site.
It is interesting to note
there are country specific preferences for accessing the CDA. For
instance, the majority of users in the UK prefer WebChaser
while users in Japan prefer FTP. This may be related to the
UK becoming a mirror site host in 2002, which improves
response time for local use of WebChaser. Japan is
scheduled to host a mirror site in the near future, which
should hopefully make the use of WebChaser with its superior
search capabilities more attractive. Users in China seem to
prefer the PRI (with the exception of 2004, when there were
two complete downloads of the archive using ftp). The PRI, a
much simpler application than WebChaser, requires much less network bandwidth and thus may be more appealing in countries with limited bandwidth.
FIGURE 30: Relative usage of PRI, WebChaser,
and ftp for the UK and Japan.
Publications and Downloads
What sorts of events trigger downloads of a particular observation? Three obvious events are the public release of data (the date of which is available in the CDA), scientific publication of the data (tracked by the CDA operations group (Rots et al. 2004)), and media coverage related to the observation (tracked by the Chandra Education and Public Outreach group). We find that, not surprisingly, all three types of events increase the number of downloads. Most observations are downloaded within days of going public, indicating that users monitor the archive for such events. Scientific publications increase downloads around the time of the publication, more so for the first publication associated with an observation. Press releases have a lesser effect, stimulating downloads only if the release is picked up by many large media services.
FIGURE 31: Time-lines of two Chandra observations showing the effects of public release, scientific publications and media coverage on downloads.
Conclusion
Our analysis of the Chandra downloads database indicates that users generally prefer web interfaces and direct access via ftp to stand alone applications. Preferences for download applications vary by country, possibly because of varying bandwidth. Public release of a dataset, scientific publications of data, and major press coverage all appear to increase the downloads of an observation near the time of the event.
This work is supported by NASA contract NAS 8-03060.
S. Winkelman, A. Rots, A. Duffy, S. Blecksmith, D. Jerius
Chandra X-ray Center/Smithsonian Astrophysical Observatory
References
Blecksmith, E., Paltani, S., Rots, A. & Winkelman, S. 2003, adassxii, 283
Rots, A., Winkelman, S., Paltani, S., Blecksmith, S. & Bright, J. 2004, adassxiii, 605