Demo: RAIL Evaluation

The purpose of this notebook is to demonstrate the application of the metrics scripts to be used on the photo-z PDF catalogs produced by the PZ working group. The first implementation of the evaluation module is based on the refactoring of the code used in Schmidt et al. 2020, available on Github repository PZDC1paper.

To run this notebook, you must install qp and have the notebook in the same directory as utils.py (available in RAIL's examples directrory). You must also install some run-of-the-mill Python packages: numpy, scipy, matplotlib, and seaborn.

Contents

Data

To compute the photo-z metrics of a given test sample, it is necessary to read the output of a photo-z code containing galaxies' photo-z PDFs. Let's use the toy data available in tests/data/ (test_dc2_training_9816.hdf5 and test_dc2_validation_9816.hdf5) and the configuration file available in examples/configs/FZBoost.yaml to generate a small sample of photo-z PDFs using the FZBoost algorithm available on RAIL's estimation module.

Photo-z Results

Run FZBoost

Go to dir <your_path>/RAIL/examples/estimation/ and run the command:

python main.py configs/FZBoost.yaml

The photo-z output files (inputs for this notebook) will be writen at:

<your_path>/RAIL/examples/estimation/results/FZBoost/test_FZBoost.hdf5.

Let's use the ancillary function read_pz_output to facilitate the reading of all necessary data.

The inputs for the metrics shown above are the array of true (or spectroscopic) redshifts, and an ensemble of photo-z PDFs (a qp.Ensemble object).


Metrics

PIT

The Probability Integral Transform (PIT), is the Cumulative Distribution Function (CDF) of the photo-z PDF

$$ \mathrm{CDF}(f, q)\ =\ \int_{-\infty}^{q}\ f(z)\ dz $$

evaluated at the galaxy's true redshift for every galaxy $i$ in the catalog.

$$ \mathrm{PIT}(p_{i}(z);\ z_{i})\ =\ \int_{-\infty}^{z^{true}_{i}}\ p_{i}(z)\ dz $$

The evaluate method od PIT class returns two objects, a spline for samples (a frozen distribution object), and a dictionary of meta metrics associated to PIT (to be detailed below).

PIT values

PIT outlier rate

The PIT outlier rate is a global metric defined as the fraction of galaxies in the sample with extreme PIT values. The lower and upper limits for considering a PIT as outlier are optional parameters set at the Metrics instantiation (default values are: PIT $<10^{-4}$ or PIT $>0.9999$).

PIT-QQ plot

The histogram of PIT values is a useful tool for a qualitative assessment of PDFs quality. It shows whether the PDFs are:

Following the standards in DC1 paper, the PIT histogram is accompanied by the quantile-quantile (QQ), which can be used to compare qualitatively the PIT distribution obtained with the PDFs agaist the ideal case (uniform distribution). The closer the QQ plot is to the diagonal, the better is the PDFs calibration.

The black horizontal line represents the ideal case where the PIT histogram would behave as a uniform distribution U(0,1).


Summary statistics of CDF-based metrics

To evaluate globally the quality of PDFs estimates, rail.evaluation provides a set of metrics to compare the empirical distributions of PIT values with the reference uniform distribution, U(0,1).

Kolmogorov-Smirnov

Let's start with the traditional Kolmogorov-Smirnov (KS) statistic test, which is the maximum difference between the empirical and the expected cumulative distributions of PIT values:

$$ \mathrm{KS} \equiv \max_{PIT} \Big( \left| \ \mathrm{CDF} \small[ \hat{f}, z \small] - \mathrm{CDF} \small[ \tilde{f}, z \small] \ \right| \Big) $$

Where $\hat{f}$ is the PIT distribution and $\tilde{f}$ is U(0,1). Therefore, the smaller value of KS the closer the PIT distribution is to be uniform. The evaluate method of the PITKS class returns a named tuple with the statistic and p-value.

Visual interpretation of the KS statistic:

Cramer-von Mises

Similarly, let's calculate the Cramer-von Mises (CvM) test, a variant of the KS statistic defined as the mean-square difference between the CDFs of an empirical PDF and the true PDFs:

$$ \mathrm{CvM}^2 \equiv \int_{-\infty}^{\infty} \Big( \mathrm{CDF} \small[ \hat{f}, z \small] \ - \ \mathrm{CDF} \small[ \tilde{f}, z \small] \Big)^{2} \mathrm{dCDF}(\tilde{f}, z) $$

on the distribution of PIT values, which should be uniform if the PDFs are perfect.

Anderson-Darling

Another variation of the KS statistic is the Anderson-Darling (AD) test, a weighted mean-squared difference featuring enhanced sensitivity to discrepancies in the tails of the distribution.

$$ \mathrm{AD}^2 \equiv N_{tot} \int_{-\infty}^{\infty} \frac{\big( \mathrm{CDF} \small[ \hat{f}, z \small] \ - \ \mathrm{CDF} \small[ \tilde{f}, z \small] \big)^{2}}{\mathrm{CDF} \small[ \tilde{f}, z \small] \big( 1 \ - \ \mathrm{CDF} \small[ \tilde{f}, z \small] \big)}\mathrm{dCDF}(\tilde{f}, z) $$

It is possible to remove catastrophic outliers before calculating the integral for the sake of preserving numerical instability. For instance, Schmidt et al. computed the Anderson-Darling statistic within the interval (0.01, 0.99).

Kullback-Leibler Divergence

Another way to quantify the difference between two distributions is the Kullback–Leibler divergence, also known as relative entropy), defined as:

$$ \mathrm{D}_{KL}(P||Q) = \int_{-\infty}^{\infty} P(x) \log{ \left( \frac{P(x)}{Q(x)} \right) dx }$$

in this case,

$$ \mathrm{D}_{KL} = \int_{-\infty}^{\infty} \mathrm{CDF} \small[ \hat{f}, z \small] \ \log{ \left( \frac{\mathrm{CDF} \small[ \hat{f}, z \small]}{\mathrm{CDF}\small[\tilde{f}, z\small]}\right) dx }$$

CDE Loss

In the absence of true photo-z posteriors, the metric used to evaluate individual PDFs is the Conditional Density Estimate (CDE) Loss, a metric analogue to the root-mean-squared-error:

$$ L(f, \hat{f}) \equiv \int \int {\big(f(z | x) - \hat{f}(z | x) \big)}^{2} dzdP(x), $$

where $f(z | x)$ is the true photo-z PDF and $\hat{f}(z | x)$ is the estimated PDF in terms of the photometry $x$. Since $f(z | x)$ is unknown, we estimate the CDE Loss as described in Izbicki & Lee, 2017 (arXiv:1704.08095). :

$$ \mathrm{CDE} = \mathbb{E}\big( \int{{\hat{f}(z | X)}^2 dz} \big) - 2{\mathbb{E}}_{X, Z}\big(\hat{f}(Z, X) \big) + K_{f}, $$

where the first term is the expectation value of photo-z posterior with respect to the marginal distribution of the covariates X, and the second term is the expectation value with respect to the joint distribution of observables X and the space Z of all possible redshifts (in practice, the centroids of the PDF bins), and the third term is a constant depending on the true conditional densities $f(z | x)$.

Summary