KernelTest#
- class QuadratiK.kernel_test.KernelTest(h=None, method='subsampling', num_iter=150, b=0.9, quantile=0.95, mu_hat=None, sigma_hat=None, centering_type='nonparam', alternative=None, k_threshold=10, random_state=None, n_jobs=8)#
Class for performing the kernel-based quadratic distance goodness-of-fit tests using the Gaussian kernel with tuning parameter h. Depending on the input y the function performs the test of multivariate normality, the non-parametric two-sample tests or the k-sample tests.
Parameters#
- hfloat, optional
Bandwidth for the kernel function.
- methodstr, optional
The method used for critical value estimation (“subsampling”, “bootstrap”, or “permutation”).
- num_iterint, optional
The number of iterations to use for critical value estimation. Defaults to 150.
- bfloat, optional
The size of the subsamples used in the subsampling algorithm. Defaults to 0.9.
- quantilefloat, optional
The quantile to use for critical value estimation. Defaults to 0.95.
- mu_hatnumpy.ndarray, optional
Mean vector for the reference distribution. Defaults to None.
- sigma_hatnumpy.ndarray, optional
Covariance matrix of the reference distribution. Defaults to None.
- alternativestr, optional
String indicating the type of alternative to be used for calculating “h” by the tuning parameter selection algorithm when h is not provided. Defaults to ‘None’
- k_thresholdint, optional
Maximum number of groups allowed. Defaults to 10. Change in case of more than 10 groups.
- random_stateint, None, optional.
Seed for random number generation. Defaults to None
- n_jobsint, optional.
n_jobs specifies the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. For more information on joblib n_jobs refer to - https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html. Defaults to 8.
Attributes#
- test_type_str
The type of test performed on the data
- execution_timefloat
Time taken for the test method to execute
- h0_rejected_boolean
Whether the null hypothesis is rejected (True) or not (False)
- test_statistic_float
Test statistic of the perfomed test type
- cv_float
Critical value
- cv_method_str
Critical value method used for performing the test
References#
Markatou M., Saraceno G., Chen Y (2023). “Two- and k-Sample Tests Based on Quadratic Distances. ”Manuscript, (Department of Biostatistics, University at Buffalo)
Lindsay BG, Markatou M. & Ray S. (2014) Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests, Journal of the American Statistical Association, 109:505, 395-410, DOI: 10.1080/01621459.2013.836972
Examples#
>>> # Example for normality test >>> import numpy as np >>> from QuadratiK.kernel_test import KernelTest >>> np.random.seed(42) >>> data = np.random.randn(100,5) >>> normality_test = KernelTest(h=0.4, centering_type="param",random_state=42).test(data) >>> print("Test : {}".format(normality_test.test_type_)) >>> print("Execution time: {:.3f}".format(normality_test.execution_time)) >>> print("H0 is Rejected : {}".format(normality_test.h0_rejected_)) >>> print("Test Statistic : {}".format(normality_test.test_statistic_)) >>> print("Critical Value (CV) : {}".format(normality_test.cv_)) >>> print("CV Method : {}".format(normality_test.cv_method_)) >>> print("Selected tuning parameter : {}".format(normality_test.h)) ... Test : Kernel-based quadratic distance Normality test ... Execution time: 0.096 ... H0 is Rejected : False ... Test Statistic : -8.588873037044384e-05 ... Critical Value (CV) : 0.0004464111809800183 ... CV Method : Empirical ... Selected tuning parameter : 0.4
>>> # Example for two sample test >>> import numpy as np >>> from QuadratiK.kernel_test import KernelTest >>> np.random.seed(42) >>> X = np.random.randn(100,5) >>> np.random.seed(42) >>> Y = np.random.randn(100,5) >>> two_sample_test = KernelTest(h=0.4, centering_type="param").test(X,Y) >>> print("Test : {}".format(two_sample_test.test_type_)) >>> print("Execution time: {:.3f}".format(two_sample_test.execution_time)) >>> print("H0 is Rejected : {}".format(two_sample_test.h0_rejected_)) >>> print("Test Statistic : {}".format(two_sample_test.test_statistic_)) >>> print("Critical Value (CV) : {}".format(two_sample_test.cv_)) >>> print("CV Method : {}".format(two_sample_test.cv_method_)) >>> print("Selected tuning parameter : {}".format(two_sample_test.h)) ... Test : Kernel-based quadratic distance two-sample test ... Execution time: 0.092 ... H0 is Rejected : False ... Test Statistic : -0.019707895277270022 ... Critical Value (CV) : 0.003842482597612725 ... CV Method : subsampling ... Selected tuning parameter : 0.4
Methods
Function to generate descriptive statistics per variable (and per group if available). |
|
|
Summary function generates a table for the kernel test results and the summary statistics. |
|
Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. |
- KernelTest.stats()#
Function to generate descriptive statistics per variable (and per group if available).
Returns#
- summary_stats_dfpandas.DataFrame
Dataframe of descriptive statistics
- KernelTest.summary(print_fmt='simple_grid')#
Summary function generates a table for the kernel test results and the summary statistics.
Parameters#
- print_fmtstr, optional.
Used for printing the output in the desired format. Defaults to “simple_grid”. Supports all available options in tabulate, see here: https://pypi.org/project/tabulate/
Returns#
- summarystr
A string formatted in the desired output format with the kernel test results and summary statistics.
- KernelTest.test(x, y=None)#
Function to perform the kernel-based quadratic distance tests using the Gaussian kernel with bandwidth parameter h. Depending on the shape of the y, the function performs the tests of multivariate normality, the non-parametric two-sample tests or the k-sample tests.
Parameters#
- xnumpy.ndarray or pandas.DataFrame.
A numeric array of data values.
- ynumpy.ndarray or pandas.DataFrame, optional
A numeric array data values (for two-sample test) and a 1D array of class labels (for k-sample test). Defaults to None.
Returns#
- selfobject
Fitted estimator