A Summary of the Chandra Data Science Workshop
Rodolfo Montez Jr.
The Chandra X-ray Center completed the virtual science workshop Chandra Data Science: Novel Methods in Computing and Statistics for X-ray Astronomy over a three week period this past summer (August 17th – September 1st, 2021). The intention of this meeting was to showcase the reach of data science techniques into the high-energy community, with an emphasis on the science enabled by novel techniques in imaging, spectroscopy, and time-domain analysis, as well as the important role of source identification, classification, and cross-matching on large data sets.
The workshop was organized by a wonderful Science Organizing Committee (SOC) co-chaired by Hans Moritz Günther (MIT) and Rodolfo Montez Jr. (CfA). As the regular ex officio SOC member and organizer over the past five years of CDO-sponsored science workshops, this will be one of the most memorable groups I have had the pleasure to work with. The collective knowledge and effort of the SOC honed the content and speaker selections for this unique topic. The meeting featured over 65 speakers, with 19 invited talks, 17 contributed talks, and 29 lightning talks (shorter format talks designed to replace the role of posters), plus additional guest speakers organized by subcommittees of the SOC.
The topics at Chandra Data Science focused on science enabled by rigorous analysis of data from Chandra and other X-ray missions plus complimentary multiwavelength data. In this section, I provide a few of my personal highlights.
Figure 1: Michelle Ntampakataking us through the levels of a neural network used in her cosmological studies with galaxy clusters.
One of my favorite sessions was the Thunderstorm session I call it this not only because it featured a total of 9 talks, 7 of which were lightning talks, but also because thunderstorms were rolling through the northeast United States. In the midst of that session, Michelle Ntampaka from the Space Telescope Science Institute gave us a thought-provoking invited talk on the Importance of Being Interpretable. Michelle started by asking us to consider how Astronomy is the ideal sandbox for machine learning then demonstrated how biases enter into studies and impact our interpretation, using her work on cosmological studies with galaxy clusters as an example, and how important saliency maps are for accurate interpretation. The sequence presented by Michelle is highly recommended for novice and experienced practitioners of similar machine learning techniques in their own work.
Figure 2: The unexpected features that an early neural network identifies with cat – here has whiskers.
Yuanyuan Su from University of Kentucky kicked off the second week of science sessions demystifying neural networks for us with a concise explanation and showing how they can be used to classify cooling core clusters from synthetic Chandra observations derived from the IllustrisTNG simulation data. Yuanyuan also shared progress on applying machine learning to real observations and prospects for the future. There was a real world demonstration of how neural networks work and how important features are not always as expected (provided by a special junior collaborator and highly worth a view!).
Figure 3: Lucia Härer takes us through the high-resolution grating spectroscopy features that identify portions of the clumpy environment of the X-ray binary Cyg X-1.
In the last science session, Lucia Härer from Dr. Karl-Remeis Observatory unpacked the systematic instrumental impact of chip gaps and spacecraft dithering needed to perform excess variance high-resolution grating spectroscopy analysis can be used to study the clumpy environment of the X-ray binary Cyg X-1.
Of course, I could go on, as all of the sessions were highly informative. All of the talks are available on our YouTube channel and can be accessed via the play button links on the schedule page or via this complete YouTube playlist. I encourage you to review the schedule to find topics that interest you or choose from the playlist at random when you feel like learning about the excellent techniques employed by the X-ray community.
We featured two coffee chats, or short (45 min) chats on a topic of interest selected by a subcommittee of the SOC (Marie-Lou Gendron-Marsolais, Rodolfo Montez Jr., and Abigail Stevens). The first coffee chat featured Gus Muench and Peter K. G. Williams, representing the AAS, in conversation with SOC member Abigail Stevens on the topic of journal production processes relevant to data science interests. The second coffee chat featured two extronomers or astronomers who took their skills to the data science industry in conversation with SOC member Koji Mukai on the topic of transitioning to the data science industry.
Out of respect for the privacy of our guests, only the AAS Coffee Chat is available for viewing on the YouTube channel.
In addition, three days of self-organized tutorials followed the scientific sessions. These tutorials gave attendees an opportunity to learn about the latest features of CXC-lead projects (SAOImage/ds9, Sherpa, and the Chandra Source Catalog), as well as the advanced analysis methods developed by the community (Bayesian X-ray Analysis (BXA) and Multi-Mission Maximum Likelihood (3ML) framework).
Videos from the tutorials can be found here:
- Bayesian X-ray Analysis – led by Peter Boorman and Johannes Buchner
- Multi-Mission Maximum Likelihood – led by J. Michael Burgess
- SAOImage/ds9 – led by Kenny Glotfelty
- Chandra Source Catalog – led by Rafael Martinez Galarza (also see his article in this issue of the Chandra Newsletter)
- Sherpa – led by Aneta Siemiginowska
The Chandra X-ray Center and the Chandra Director’s Office was extremely grateful for the opportunity to virtually host all of the speakers and attendees. We had just over 400 registrants, about 260 unique live attendees across the 12 science sessions, about 80 unique live attendees for the two coffee chats, and 90 unique live attendees for the three days of tutorials. The average live attendance for science sessions was 83 with a high of 144 and a low of 48.
The unique live attendees are a lower limit to the actual viewers, since hosting the workshop via a livestream on YouTube also allowed for additional attendees to view simultaneously on YouTube, in their own time zone, or even later. This is verified by the viewing statistics of the YouTube videos, where there are notable shifts from North/South American viewers to viewers in Europe and India. Slovakia deserves special mention here for several views weeks after the meeting underscoring the access and impact gained from hosting the talks online.
Format for Future CXC Workshops
Without a doubt, the virtual format of the CXC workshop has helped broaden participation on both sides of the virtual speaker podium. This particular workshop showcased a large number of early career scientists, some of which may not have the resources to attend an in person meeting. On the other hand, a repeated remark in our post-virtual workshop surveys is that the lack of interactive engagement is greatly missed. Currently, with a usual caveat about the state of affairs in the world, we intend to hold our next workshop in person with a well-integrated virtual component. We hope to provide the best of both in person and virtual and welcome comments and suggestions: email@example.com.
Special thanks to Hans Moritz Günther for being an excellent co-chair of the SOC and stepping in to edit the editor for this article.