Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Data Quality Issues

6

Flashcards

0/6

Still learning
StarStarStarStar

Noisy Data

StarStarStarStar

Noisy data contains random error or variance in a measured variable, leading to difficulty in identifying underlying patterns and potentially resulting in incorrect data mining outputs.

StarStarStarStar

Bias in Data

StarStarStarStar

Bias occurs when a dataset does not represent the population it's intended to reflect, potentially leading to inaccurate assumptions and predictions about broader groups.

StarStarStarStar

Missing Values

StarStarStarStar

Missing values occur when no data value is stored for a variable in an observation which can lead to biases in analysis and inaccurate models. Impacts include reduced statistical power, skewed results, and poor decision-making.

StarStarStarStar

Outliers

StarStarStarStar

Outliers are data points that deviate significantly from the majority of the data distribution, which can distort statistical analyses and models, leading to potential misinterpretation of the data.

StarStarStarStar

Inconsistent Data

StarStarStarStar

Inconsistency occurs when there are discrepancies in codes or names in the data set, potentially resulting from human error or merging data from various sources, impacting the accuracy of the results.

StarStarStarStar

Duplicate Data

StarStarStarStar

Duplicate data refers to the presence of repeating data entries or rows in a dataset. This can lead to misleading results, wasted storage, and can affect the performance of data mining algorithms.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.