Fitting with spectral-cube and astropy¶
Authors¶
Eric Koch, Adam Ginsburg, Tom Robitaille
Learning Goals¶
- Fitting 1D models to spectra in a spectral-line data cube
- Fitting 2D models to spectral channels or 2D projections
- Enabling dask mode for large data cubes
Keywords¶
radio astronomy, spectral-line data cubes, spectral fitting, spatial fitting
Summary¶
This tutorial demonstrates methods to fit models to a SpectralCube
object using astropy.modeling.
Requires¶
pip install astropy spectral-cube dask aplpy
%matplotlib inline
import warnings
import numpy as np
import matplotlib.pyplot as plt
import astropy.units as u
Fitting models to both spatial maps and spectra are amongst the most common operations involving spectral-line data cubes.
In this tutorial, we explore how to fit models to a SpectralCube
using astropy.modeling
. To show this, we will use an ALMA mosaic observed with the ACA, or 7-m array. This mosaic is one component of an ALMA project to map CO(2-1) in M33 (Project ID 2019.1.01182.S).
You can access the tutorial data cubes here on Zenodo.
To keep the data volume small, we will use a 10 MB cutout of the full mosaic for this tutorial.
from astropy.utils.data import download_file
datafile = download_file(
'https://zenodo.org/record/4021108/files/M33_Brick1Tile1_12CO21_0p7kms.image.pbcor_K.cutout.fits',
cache=True, show_progress=True)
from spectral_cube import SpectralCube
# Note that the datafile will not end in ".fits". Because of that, we need to specify the format
# When the file name end in "fits", `format` will not need to be specified
cube = SpectralCube.read(datafile, format='fits')
The basic cube properties are:
cube
SpectralCube with shape=(300, 64, 64) and unit=K: n_x: 64 type_x: RA---SIN unit_x: deg range: 23.408277 deg: 23.428647 deg n_y: 64 type_y: DEC--SIN unit_y: deg range: 30.750561 deg: 30.768065 deg n_s: 300 type_s: VRAD unit_s: m / s range: -280000.000 m / s: -79670.000 m / s
Note that the unit of the spectral axis is in m/s from the FITS file. It will be convenient to instead use km/s, so we will convert the cube's spectral unit:
cube = cube.with_spectral_unit(u.km / u.s)
Fitting a 1D spectral model¶
This part demonstrates how 1D spectra can be fit to a SpectralCube
. This spectral-line data cube is of CO(2-1) emission that primarily arises from giant molecular clouds (GMCs). In GMCs, we expect the CO(2-1) line shape to be set by random turbulent motions that produces a Gaussian line shape when observed on $\sim40$ pc scales, like for these observations. Spectra can also be a combination of Gaussian line profiles if multiple sources are detected along a line-of-sight.
Here, we will extract a single spectrum with bright CO emission for this example. The pixel (y, x) = (145, 340) is a location with bright CO emission.
We can extract this one spectrum from the cube and plot it:
# One example to test this on to start
y, x = 32, 32
spec = cube[:, y, x]
spec.quicklook()
This spectrum has a reasonably high signal-to-noise ratio and should provide a good example for spectral fitting. Note that spectra with a low peak signal-to-noise may require restrictions on the parameter space for the fit to converge.
Based on the above spectrum, we can provide good initial parameter values to start the fit with.
from astropy.modeling import models, fitting
# The 1D Gaussian model with initial guesses for parameters
g_init = models.Gaussian1D(amplitude=1.0 * u.K, mean=-210 * u.km / u.s, stddev=4. * u.km / u.s)
# And fit with the Levenberg-Marquardt algorithm and least squares statistic.
fit_g = fitting.LevMarLSQFitter()
# The initial model, spectral axis (in km/s) and spectrum are passed for the fit
g_fit = fit_g(g_init, spec.spectral_axis, spec)
The fit model parameters are:
g_fit
<Gaussian1D(amplitude=1.09745703 K, mean=-211.55635877 km / s, stddev=4.18782779 km / s)>
These parameters all seem reasonable given the plotted spectrum above. To compare more closely, we can plot the spectrum and the fit model:
spec.quicklook()
plt.plot(spec.spectral_axis, g_fit(spec.spectral_axis))
plt.ylabel("Brightness Temperature (K)")
plt.xlabel("Radio Velocity (km /s)")
Text(0.5, 0, 'Radio Velocity (km /s)')
And the fit residuals:
plt.plot(spec.spectral_axis, spec - g_fit(spec.spectral_axis))
plt.ylabel("Brightness Temperature (K)")
plt.xlabel("Radio Velocity (km /s)")
WARNING: SliceWarning: Slice (slice(None, None, None), None) cannot be used on this 1-dimensional array's WCS. If this is intentional, you should use this <class 'spectral_cube.lower_dimensional_structures.OneDSpectrum'>'s ``array`` or ``quantity`` attribute. [spectral_cube.lower_dimensional_structures]
Text(0.5, 0, 'Radio Velocity (km /s)')
Some residuals remain, however, the fit is fairly good.
One major feature of astropy.modeling.models
is the built-in handling of units. The fitted model g_fit
already has each parameter in useful physical units.
Exercise¶
Can the above be example fit be improved with a more sophisticated model? Try the fit using multiple Gaussian components.
Using dask¶
When using a large spectral-line data cube, the dask library can be used in spectral-cube for operations instead of numpy. See the spectral-cube documentation on dask integration for a thorough description.
Most of the above code for fitting a spectrum will be unchanged. What will change is how the data cube is read in:
cube = SpectralCube.read(datafile, use_dask=True, format='fits')
cube = cube.with_spectral_unit(u.km / u.s)
Using dask is enabled with use_dask=True
. This will return a DaskSpectralCube
:
cube
DaskSpectralCube with shape=(300, 64, 64) and unit=K and chunk size (300, 64, 64): n_x: 64 type_x: RA---SIN unit_x: deg range: 23.408277 deg: 23.428647 deg n_y: 64 type_y: DEC--SIN unit_y: deg range: 30.750561 deg: 30.768065 deg n_s: 300 type_s: VRAD unit_s: km / s range: -280.000 km / s: -79.670 km / s
Of note is the new information for the chunk size
that is defined for a DaskSpectralCube
but not the SpectralCube
at the beginning of the tutorial. Dask will load the data in chunks, and the size of the chunks for this data cube are shown as the chunk size
.
For this tutorial, we are using a small cube and so there is only 1 chunk in the DaskSpectralCube
. For larger cubes, however, there will be multiple chunks defined. The idea of these chunks is that the data can be loaded in parts to avoid reading the entire cube into memory at once.
Dask provides a task scheduler that can be used to parallelize operations. For this tutorial, we will run the fitting one a single-core, or the synchronous
scheduler in dask. To set this for our SpectralCube, we use the use_dask_scheduler
function:
cube.use_dask_scheduler('synchronous')
<spectral_cube.dask_spectral_cube.DaskSpectralCubeMixin.use_dask_scheduler.<locals>.SchedulerHandler at 0x7fa3e5a14e50>
See the integration with dask documentation page for more information on setting different schedulers and enabling parallel operations.
To track the progress of operations (particularly when operating on a large SpectralCube), dask provides a progress bar. To enable the progress bar in a notebook or terminal, use:
from dask.diagnostics import ProgressBar
pbar = ProgressBar()
pbar.register()
Fitting with the DaskSpectralCube
looks similar to the example above:
spec = cube[:, y, x]
# The 1D Gaussian model with initial guesses for parameters
g_init = models.Gaussian1D(amplitude=1.0 * u.K, mean=-210 * u.km / u.s, stddev=4. * u.km / u.s)
# And fit with the Levenberg-Marquardt algorithm and least squares statistic.
fit_g = fitting.LevMarLSQFitter()
# The initial model, spectral axis (in km/s) and spectrum are passed for the fit
g_fit = fit_g(g_init, spec.spectral_axis, spec)
spec.quicklook()
plt.plot(spec.spectral_axis, g_fit(spec.spectral_axis))
[ ] | 0% Completed | 202.78 us
[########################################] | 100% Completed | 101.33 ms
[ ] | 0% Completed | 115.36 us
[########################################] | 100% Completed | 100.96 ms
[<matplotlib.lines.Line2D at 0x7fa410ec0310>]
The result is identical, but we now see multiple progress bars printed. The bar is shown each time dask reads in or performs an operation. In this case, we're seeing dask read in the "chunk" of data that contains the spectrum we're fitting.
Using dask in this case is meant to be a working demonstration. The true value of using DaskSpectralCube
is when many spectra in the cube require fitting. Dask allows for two features here: (i) parallelize fitting spectra to speed up the operation, and (ii) reading in the data in chunks to avoid excessive memory usage, which is critical when working with data cubes larger than your machine's memory. See the ADD LINK TO PARALLEL FITTING TUTORIAL on fitting all spectra in a cube using DaskSpectralCube
.
Fitting a 2D spatial model¶
Using the same CO data, we will next cover how to fit a 2D Gaussian to a source and a simplistic approach to identify peaks. To show this, we will make a 2D velocity-integrated CO intensity map, or the moment 0 map. See ADD LINK TO MOMENTS TUTORIAL on moment maps and how to make them.
For this example, we will focus on the same giant molecular cloud (GMC) as the 1D spectrum above.
To make an optimal moment 0 map, we should mask noisy regions of the data to highlight the signal. Here we will use a simple minimum threshold and integrate over the velocities corresponding to the signal in the 1D spectrum above ($\sim-230$ to $-200$ km/s):
# Mask noisy values
masked_cube = cube.with_mask(cube > 0.12 * u.K)
# Only integrate over channels with signal
masked_cube_slab = masked_cube.spectral_slab(-230 * u.km / u.s, -190 * u.km / u.s)
moment0 = masked_cube_slab.moment0()
[ ] | 0% Completed | 104.49 us
[########################################] | 100% Completed | 101.18 ms
These sources are giant molecular clouds (GMCs) that appear "blob-like" at this resolution. A 2D Gaussian should be a reasonable model for each GMC. Note that this shape is primarily the "beam," the resolution element for these observations taken with the ALMA's (Morita) Compact Array.
For this example, we will focus on fitting a model to a single GMC; the same GMC where we fit the 1D spectrum above. To fit the model, we will slice out a small regions centered on the cloud:
y, x = 32, 32
size = 13
moment0_cutout = moment0[y-size:y+size, x-size:x+size]
moment0_cutout.quicklook(use_aplpy=True)
Most of the map is a single GMC ("blob") with some emission from a neighbouring GMC at the north edge. This cut-out should be well-suited to fit a 2D Gaussian to.
The model to fit is defined below. To help the fit converge to the expected values, we will approximate the starting parameter values based on the image above.
We can also fit directly to the RA, Dec values since the modeling handles the units (and we're looking at a very small area so the sky curvature does not matter in this case). The Dec and RA values are returned for each pixel using spatial_coordinate_map
(in order of Dec then RA).
from astropy.modeling import models, fitting
# Define the spatial grid for the fit centered at y, x = 32, 32
yy, xx = moment0_cutout.spatial_coordinate_map
# Define a single 2D Gaussian model.
p_init_gauss2D = models.Gaussian2D(x_mean=xx[size, size], y_mean=yy[size, size],
amplitude=12 * u.K * u.km / u.s,
x_stddev=8 * u.arcsec, y_stddev=8 * u.arcsec)
# And fit with the Levenberg-Marquardt algorithm and least squares statistic.
fit_p = fitting.LevMarLSQFitter()
# TODO: should be able to use with_fill_value for projections
# fill value is NOT working for Projection.
# See https://github.com/radio-astro-tools/spectral-cube/pull/661
# mom0_sub.with_fill_value(fill_value=0.0).filled_data[:]
# Set NaNs to 0 for the fit.
moment0_cutout_quant = moment0_cutout.quantity
# moment0_cutout_quant = moment0_cutout.value
moment0_cutout_quant[np.isnan(moment0_cutout_quant)] = 0.
with warnings.catch_warnings():
# Ignore model linearity warning from the fitter
warnings.simplefilter('ignore')
p_gauss2D = fit_p(p_init_gauss2D, xx, yy, moment0_cutout_quant)
The fitted model is:
p_gauss2D
<Gaussian2D(amplitude=12.87132516 K km / s, x_mean=23.41749457 deg, y_mean=30.75970478 deg, x_stddev=0.00136876 deg, y_stddev=0.00100932 deg, theta=-0.46608039 rad)>
This corresponds to a spatial size of:
print(f"{p_gauss2D.x_stddev.quantity.to(u.arcsec)} by {p_gauss2D.y_stddev.quantity.to(u.arcsec)}")
4.927549702771874 arcsec by 3.6335649764546365 arcsec
The parameters are similar to the cutout image shown above: the peak is about 12 K km/s and the centre of the GMC is near the centre of the cutout.
To visualize the quality of the fit, we can compare the data, model, and residuals:
plt.figure(figsize=(18, 6))
plt.subplot(1, 3, 1)
plt.title("Image", fontsize=18)
plt.imshow(moment0_cutout.value, origin='lower', cmap='inferno')
plt.colorbar()
plt.subplot(1, 3, 2)
plt.title("Model", fontsize=18)
plt.imshow(p_gauss2D(xx, yy).value, origin='lower', cmap='inferno')
plt.colorbar()
plt.subplot(1, 3, 3)
plt.title("Residual", fontsize=18)
plt.imshow(moment0_cutout.value - p_gauss2D(xx, yy).value, origin='lower', cmap='inferno', vmin=-0.7, vmax=0.7)
plt.colorbar()
plt.tight_layout()
The fit agrees fairly well with the data, with the largest residuals coming from the edge of the map where another GMC is located.
This example shows the simplest approach to fitting a 2D model based on a 2D spatial projection. A more sophisticated approach would take the telescope beam into account, as in this case, the beam accounts for most of the size we recover. This should be possible by using the astropy.convolution.convolve_models function and fixing the model parameters for the beam.
Using Dask¶
2D Projection
and Slice
objects do not currently have dask implementations.