Explore tens of thousands of sets crafted by our community.
Statistical Tests for Model Comparison
10
Flashcards
0/10
t-test
The t-test is applied when comparing the means of two groups or models. It assumes normality and equal variances of the data. The test statistic is calculated and compared to a critical value from the t-distribution. If the test statistic exceeds the critical value, the null hypothesis that the means are equal is rejected.
Chi-squared test
The Chi-squared test is used to compare observed and expected frequencies in categorical data from different models. The Chi-squared statistic is calculated and compared to the critical value from the Chi-squared distribution. If the statistic is higher than the critical value, the null hypothesis of independence is rejected, suggesting a difference in the categorical distributions.
McNemar's test
McNemar's test is a non-parametric method used on paired nominal data. It compares the performance of two models on the same dataset. The test is based on the asymmetry of the contingency table for the pair of models. The null hypothesis is that the row and column marginal frequencies are equal.
Wilcoxon signed-rank test
This non-parametric test compares two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ. It's used when the data doesn't meet the normality assumption of the paired t-test. A significant test result indicates that there is a difference between the paired samples.
Logrank Test
The Logrank test is used to compare the survival distributions of two samples. It is useful in survival analysis for comparing the risk of an event (e.g., system failure) at different time points. The test statistic is calculated under the null hypothesis that there's no difference between the populations in the probability of an event at any time point.
Cochran's Q test
Cochran's Q test is a non-parametric statistical test used to determine if three or more treatments have different effects. It is applicable when the data are binary and each subject receives all treatments. The Q statistic is calculated and compared against the chi-squared distribution to test for differences in treatment effects.
Bland-Altman analysis
Bland-Altman analysis is not a statistical test per se but a method to measure agreement between two quantitative measurements. It compares the differences between the two methods of measurement to the average of those methods. The limits of agreement are calculated which are used to evaluate the consistency between the two methods.
Kruskal-Wallis H test
The Kruskal-Wallis H test is a rank-based non-parametric test used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable. It is the non-parametric version of the ANOVA F-test.
ANOVA (Analysis of Variance)
ANOVA is used to compare the means of more than two groups (e.g., models or algorithms) to see if they differ significantly. It is performed by analyzing the variances within and between groups and computing an F-statistic. A significant F-statistic (typically determined by a p-value) suggests the group means are not all equal.
Friedman test
The Friedman test is a non-parametric alternative to ANOVA for repeated measures and is used to detect differences in treatments across multiple test attempts. It ranks the data across rows and sums the ranks for each column, then calculates a test statistic (chi-squared distribution) to determine if the treatments have similar effects or not.
© Hypatia.Tech. 2024 All rights reserved.