Astrophysics & AI with Python: Unlocking the Universe with Astroquery A developer demonstrates how to use the Python library Astroquery to programmatically access astronomical data from multiple archives, solving the heterogeneity problem of different APIs. The tutorial shows how to resolve coordinates for the Andromeda Galaxy using NED and query the MAST archive for Hubble Space Telescope observations, integrating with Astropy for unit handling and coordinate transformations. The universe is no longer just observed through a physical telescope eyepiece; it is read, parsed, and analyzed through code. For the modern data-driven astronomer, the sky is a massive, distributed database. However, accessing this data presents a unique challenge: the "Babel of Archives." How do you programmatically search the accumulated knowledge of humanity when that knowledge is scattered across dozens of independent institutions, each with its own proprietary query language, format, and API? The answer is Astroquery . This powerful Python library serves as the universal translator for the Virtual Observatory, turning complex web requests into simple function calls. In this guide, we will explore the theoretical foundations of this tool and walk through a practical script to fetch Hubble Space Telescope data for the Andromeda Galaxy. Modern astronomy is defined by the data deluge. From the Hubble Space Telescope HST to the James Webb Space Telescope JWST and the Gaia mission, we are collecting petabytes of data. But this data isn't stored on a single central server. It is housed in specialized archives: If you wanted to find all data on M31, you would historically need to write custom API wrappers for all three archives. This is the Heterogeneity Problem . Think of astroquery as a Universal Research Librarian. You give it a simple instruction in Python, and it performs the complex, hidden work behind the scenes: Crucially, astroquery integrates tightly with astropy.coordinates . It handles unit conversions and reference frame transformations like precessing coordinates from J2000 to the current epoch automatically, eliminating a massive source of error in scientific research. Let’s put theory into practice. In this example, we will perform the standard two-step astronomical query: python import astropy.units as u from astropy.coordinates import SkyCoord from astroquery.ned import Ned from astroquery.mast import Mast import sys --- PART 1: Coordinate Resolution using NED --- 1. Define the target object name. TARGET NAME = "M31" print f"--- 1. Resolving Coordinates for {TARGET NAME} using NED ---" try: Query NED for the object. The result is an Astropy Table. ned result table = Ned.query object TARGET NAME except Exception as e: print f"Error querying NED for {TARGET NAME}: {e}" sys.exit 1 2. Extract RA and Dec in decimal degrees . try: ra deg = ned result table 'RA deg ' 0 dec deg = ned result table 'DEC deg ' 0 except IndexError: print f"Error: NED returned an empty result for {TARGET NAME}." sys.exit 1 3. Create a standardized SkyCoord object with units. target coord = SkyCoord ra=ra deg u.degree, dec=dec deg u.degree, frame='icrs' print f"Resolved Coordinates: RA={target coord.ra.deg:.4f} deg, Dec={target coord.dec.deg:.4f} deg" --- PART 2: Querying the MAST Archive --- 4. Define the search radius. M31 is large, so we use a generous radius. search radius = 0.5 u.degree print f"\n--- 2. Querying MAST for HST Observations within {search radius} of M31 ---" 5. Query MAST using the coordinates and radius. mast observations = Mast.query criteria coordinates=target coord, radius=search radius, obs collection="HST" Filter for Hubble data only 6. Display the results. if mast observations is not None and len mast observations 0: print f"\nSuccess Found {len mast observations } HST observations." print "\nMetadata Summary First 5 entries :" Select specific columns for a clean summary summary data = mast observations 'obsid', 'instrument name', 't exptime', 'filters' :5 print summary data else: print "\nNo HST observations found." print "\nQuery process complete." We import astropy.units aliased as u and SkyCoord . In modern astronomical coding, units are mandatory. Passing a raw number like 0.5 is dangerous—is that 0.5 degrees, radians, or arcseconds? By multiplying 0.5 u.degree , we create a unit-aware object that astroquery understands perfectly. The function Ned.query object "M31" sends a request to the NASA/IPAC Extragalactic Database. It returns an Astropy Table containing metadata redshift, object type, etc. . We extract the RA deg and DEC deg columns. 0 because even a single name query returns a table a list of rows . We grab the first row as the primary match.We wrap the raw numbers into target coord = SkyCoord ... . This object is the currency of the Astropy ecosystem. It carries not just the numbers, but the units u.degree and the frame icrs - the International Celestial Reference System . We use Mast.query criteria . This is the Swiss Army knife of MAST queries. coordinates=target coord radius=search radius obs collection="HST" The result is an Astropy Table . This is superior to a standard Pandas DataFrame for astronomy because it preserves scientific metadata . It knows the units of every column and the provenance of the data. We slice the table to show the first 5 entries and specific columns obsid , instrument name , t exptime , filters to keep the output readable. The most common error for beginners is forgetting astropy.units . Incorrect: search radius = 0.5 Just a float Correct: search radius = 0.5 u.degree A physical quantity If you pass a bare number, astroquery will raise an error because it cannot assume the unit. Always use units astroquery is more than a convenience wrapper; it is the glue that holds the fragmented world of astronomical archives together. By abstracting away the complexities of HTTP requests, XML parsing, and coordinate transformations, it allows researchers to focus on the science rather than the plumbing. Whether you are building a training set for an AI model or analyzing the spectral energy distribution of a galaxy, astroquery provides the standardized, programmatic access required for reproducible, modern science. astroquery to programmatically curate a balanced training dataset of spiral vs. elliptical galaxies?The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Astrophysics & AI: Building Research Agents for Astronomy, Cosmology, and SETI . You can find it here http://tiny.cc/PythonAstrophysics . Check all the other 50 Programming & AI ebooks with python, typescript, swift, c : here http://tiny.cc/ProgrammingBooks