What are the missing patterns that are generally observed?

The missing patterns that are generally observed are

  • Missing completely at random

  • Missing at random

  • Missing that depends on the missing value itself

  • Missing that depends on unobserved input variable

What is KNN imputation method?

In KNN imputation, the missing attribute values are imputed by using the attributes value that are most similar to the attribute whose values are missing. By using a distance function, the similarity of two attributes is determined.

What are the data validation methods used by data analyst?

Usually, methods used by data analyst for data validation are

  • Data screening

  • Data verification

What should be done with suspected or missing data?

  • Prepare a validation report that gives information of all suspected data. It should give information like validation criteria that it failed and the date and time of occurrence

  • Experience personnel should examine the suspicious data to determine their acceptability

  • Invalid data should be assigned and replaced with a validation code

  • To work on missing data use the best analysis strategy like deletion method, single imputation methods, model based methods, etc.

How to deal the multi-source problems?

To deal the multi-source problems,

  • Restructuring of schemas to accomplish a schema integration

  • Identify similar records and merge them into single record containing all relevant attributes without redundancy

What is an Outlier?

The outlier is a commonly used terms by analysts referred for a value that appears far away and diverges from an overall pattern in a sample. There are two types of Outliers

  • Univariate

  • Multivariate

What is Hierarchical Clustering Algorithm?

Hierarchical clustering algorithm combines and divides existing groups, creating a hierarchical structure that showcase the order in which groups are divided or merged.

What is K-mean Algorithm?

K mean is a famous partitioning method.  Objects are classified as belonging to one of K groups, k chosen a priori.

In K-mean algorithm,

  • The clusters are spherical: the data points in a cluster are centered around that cluster

  • The variance/spread of the clusters is similar: Each data point belongs to the closest cluster


