Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Statistical Concepts for Analytics

50

Flashcards

0/50

Still learning
StarStarStarStar

Mean

StarStarStarStar

The mean is the average value of a data set, calculated by summing all numbers and dividing by the count. Used to assess central tendency in data sets.

StarStarStarStar

Median

StarStarStarStar

The median is the middle value in a sorted data set. It is less affected by outliers and skewed data than the mean.

StarStarStarStar

Mode

StarStarStarStar

The mode is the value that occurs most frequently in a data set. Used for categorical data to determine the most common category.

StarStarStarStar

Standard Deviation

StarStarStarStar

Standard deviation measures the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean, whereas a high standard deviation indicates the values are spread out.

StarStarStarStar

Variance

StarStarStarStar

Variance quantifies the spread of a data set. A high variance indicates a wide spread around the mean, and a low variance indicates a narrow spread.

StarStarStarStar

Skewness

StarStarStarStar

Skewness measures the asymmetry of the probability distribution of a real-valued random variable. Positive skewness indicates a tail on the right side, and negative skewness indicates a tail on the left side.

StarStarStarStar

Kurtosis

StarStarStarStar

Kurtosis is a measure of the 'tailedness' of the probability distribution. Excess kurtosis describes how sharp the peak is, relative to a normal distribution.

StarStarStarStar

Correlation

StarStarStarStar

Correlation measures the strength and direction of a linear relationship between two variables on a scatterplot. Values range from -1 (perfect negative) to +1 (perfect positive).

StarStarStarStar

Regression Analysis

StarStarStarStar

Regression analysis determines the relationship between dependent and independent variables. It's used to predict outcomes and assess which variables significantly impact the response variable.

StarStarStarStar

P-value

StarStarStarStar

The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the value observed, under the assumption that the null hypothesis is true. A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis.

StarStarStarStar

Null Hypothesis

StarStarStarStar

The null hypothesis (H0H_0) is a general statement or default position that there is no relationship between two measured phenomena or no association among groups.

StarStarStarStar

Alternative Hypothesis

StarStarStarStar

The alternative hypothesis (HaH_a) is what you might believe to be true or hope to prove true. It states that there is a statistically significant effect or relationship between variables.

StarStarStarStar

Type I Error

StarStarStarStar

A Type I error occurs when the null hypothesis is wrongly rejected when it is, in fact, true (false positive). The probability of committing a Type I error is denoted by the Greek letter alpha (α\alpha).

StarStarStarStar

Type II Error

StarStarStarStar

A Type II error happens when the null hypothesis is wrongly accepted when it's false (false negative). The probability of committing a Type II error is denoted by the Greek letter beta (β\beta).

StarStarStarStar

Confidence Interval

StarStarStarStar

A confidence interval quantifies the uncertainty in an estimate. It provides a range of values within which the true population parameter is expected to lie with a certain confidence level (often 95%).

StarStarStarStar

Population

StarStarStarStar

The population in statistics is the entire group that you want to draw conclusions about.

StarStarStarStar

Sample

StarStarStarStar

A sample is a subset of individuals from a larger population, used to make inferences about the population.

StarStarStarStar

Sampling Distribution

StarStarStarStar

The sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population.

StarStarStarStar

Central Limit Theorem

StarStarStarStar

The Central Limit Theorem states that the sampling distribution of the sample mean will be approximately normally distributed if the sample size is large enough, regardless of the population's distribution.

StarStarStarStar

Z-Score

StarStarStarStar

The Z-score is a statistical measure that describes a value's relationship to the mean of a group of values, measured in terms of standard deviations from the mean. A Z-score of 0 means the value is exactly at the mean.

StarStarStarStar

T-Test

StarStarStarStar

The t-test is a statistical test used to compare the means of two groups, or to compare a sample mean to a known value when the standard deviation is not known and the sample size is small.

StarStarStarStar

ANOVA (Analysis of Variance)

StarStarStarStar

ANOVA is a statistical method used to test differences between two or more means. It assesses the importance of one or more factors by comparing the means' response variance.

StarStarStarStar

Chi-Square Test

StarStarStarStar

The Chi-square test is used to determine whether there is a significant association between categorical variables. It compares the observed frequencies to the expected frequencies.

StarStarStarStar

Statistical Significance

StarStarStarStar

Statistical significance is the likelihood that a relationship between two or more variables is not due to random chance. It is typically tested using a p-value with a predefined threshold, such as 0.05.

StarStarStarStar

Time Series Analysis

StarStarStarStar

Time series analysis involves statistical techniques used to analyze time-ordered sequence data, to extract meaningful statistics and characteristics of the data, and possibly to forecast future values.

StarStarStarStar

Categorical Variable

StarStarStarStar

Categorical variables represent types of data which may be divided into groups, such as 'race', 'sex', or 'educational level'.

StarStarStarStar

Continuous Variable

StarStarStarStar

Continuous variables can take on an unlimited number of values between the lowest and highest points of measurement. Examples include time, temperature, and distance.

StarStarStarStar

Discrete Variable

StarStarStarStar

Discrete variables can only take on a finite number of values. This includes counts of occurrences, such as the number of children in a family.

StarStarStarStar

Dependent Variable

StarStarStarStar

A dependent variable is what you measure in the experiment and what is affected during the experiment. It's called dependent because it depends on the independent variable.

StarStarStarStar

Independent Variable

StarStarStarStar

An independent variable is the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable.

StarStarStarStar

Histogram

StarStarStarStar

A histogram is a graphical representation of the distribution of numerical data, where the data is divided into bins or intervals, and the frequency of data in each bin is represented by the height of the bar.

StarStarStarStar

Scatterplot

StarStarStarStar

A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the dataset gets plotted as a point whose x-y coordinates relates to its values for the two variables.

StarStarStarStar

R-Squared

StarStarStarStar

R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

StarStarStarStar

Outlier

StarStarStarStar

An outlier is an observation point that is distant from other observations. Outliers may be due to variability in measurement or may indicate experimental error; they can distort statistical analyses.

StarStarStarStar

Parameter

StarStarStarStar

In statistics, a parameter is a numerical characteristic of the population, such as the population mean or standard deviation, as opposed to a statistic, which is a characteristic of a sample.

StarStarStarStar

Statistic

StarStarStarStar

A statistic is a characteristic of a sample, typically an estimate of a corresponding population parameter obtained by calculating from sample data.

StarStarStarStar

Quantile

StarStarStarStar

Quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. For example, quartiles are the three cut points that will divide a dataset into four equal-sized groups.

StarStarStarStar

Interquartile Range

StarStarStarStar

The interquartile range (IQR) measures the statistical dispersion and is the difference between the upper quartile (75th percentile) and lower quartile (25th percentile).

StarStarStarStar

Coefficient of Variation

StarStarStarStar

The coefficient of variation (CV) is a relative measure of the dispersion of data points in a data series around the mean, expressed as a percentage. It is particularly useful when comparing data series with different units or widely different means.

StarStarStarStar

Probability Density Function

StarStarStarStar

A probability density function (PDF) is a function that describes the likelihood of a random variable to take on a given value. The area under the PDF curve between two values represents the probability that a random variable falls within that range.

StarStarStarStar

Confounding Variable

StarStarStarStar

A confounding variable is an extraneous variable in a statistical model that correlates with both the dependent and independent variables. If not controlled, it can cause a spurious association, leading to invalid conclusions about the relationship between the variables.

StarStarStarStar

Linear Regression

StarStarStarStar

Linear regression is a basic and commonly used type of predictive analysis which shows the relationship between two or more variables by fitting a linear equation to observed data. The equation takes the form y=b0+b1x1+...+bnxny = b_0 + b_1x_1 + ... + b_nx_n.

StarStarStarStar

Logistic Regression

StarStarStarStar

Logistic regression is used when the dependent variable is binary (1/0, True/False, Yes/No) and it predicts the probability of the outcome using a logit function.

StarStarStarStar

Survival Analysis

StarStarStarStar

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event happens, such as death in biological organisms and failure in mechanical systems.

StarStarStarStar

Power of a Test

StarStarStarStar

The power of a statistical test is the probability that the test will reject a false null hypothesis (i.e., the probability of not making a Type II error).

StarStarStarStar

Bayesian Statistics

StarStarStarStar

Bayesian statistics is an approach to statistics in which all forms of uncertainty are expressed in terms of probability, unlike the frequentist approach that only uses sample data to make inferences.

StarStarStarStar

Multivariate Analysis

StarStarStarStar

Multivariate analysis deals with the statistical analysis of data collected on more than one dependent variable.

StarStarStarStar

Factor Analysis

StarStarStarStar

Factor analysis is a technique that is used to reduce a large number of variables into fewer numbers of factors. This technique extracts maximum common variance from all variables and puts them into a common score.

StarStarStarStar

Principal Component Analysis (PCA)

StarStarStarStar

PCA is a dimensionality-reduction method that is typically used to reduce the dimensionality of large datasets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

StarStarStarStar

Nonparametric Statistics

StarStarStarStar

Nonparametric statistics refers to a set of statistical tests that do not assume a specific distribution for the data, often used when the data is not normally distributed and cannot satisfy the assumptions of traditional parametric tests.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.