Explore tens of thousands of sets crafted by our community.
Data Quality Issues
6
Flashcards
0/6
Noisy Data
Noisy data contains random error or variance in a measured variable, leading to difficulty in identifying underlying patterns and potentially resulting in incorrect data mining outputs.
Bias in Data
Bias occurs when a dataset does not represent the population it's intended to reflect, potentially leading to inaccurate assumptions and predictions about broader groups.
Missing Values
Missing values occur when no data value is stored for a variable in an observation which can lead to biases in analysis and inaccurate models. Impacts include reduced statistical power, skewed results, and poor decision-making.
Outliers
Outliers are data points that deviate significantly from the majority of the data distribution, which can distort statistical analyses and models, leading to potential misinterpretation of the data.
Inconsistent Data
Inconsistency occurs when there are discrepancies in codes or names in the data set, potentially resulting from human error or merging data from various sources, impacting the accuracy of the results.
Duplicate Data
Duplicate data refers to the presence of repeating data entries or rows in a dataset. This can lead to misleading results, wasted storage, and can affect the performance of data mining algorithms.
© Hypatia.Tech. 2024 All rights reserved.