Data Science – Interview Q & A.
Set-1:
- Difference Between Training & Testing Set?
- Difference In Validation Set & Testing Set?
- Define Bias & Variance.
- How You Will Handle Missing Values In The Dataset ?
- How Decision Tree Classifier Works ?
- How Logistic Regression Model Evaluated?
- Assumptions Of Linear Regression Model.
- What Is Multicollinearity How To Handle It?
- Explain Why Performance Of XGBoost Is Better & Why ?
- Why Is An Encoder & Decoder Model Is Used In NLP ?
Set-2:
- Difference In Machine Learning & Artificial Intelligence ?
- Difference In Deep Learning & Machine Learning .
- What Is Cross Validation ?
- What Are The Types Of Machine Learning ?
- Difference Between Supervised & Unsupervised Machine Learning ?
- What Is Selection Bias ?
- What Is The Difference Between The Correlation & Causality ?
- What Is The Difference Between Correlation & Covariance ?
- What Is The Difference Between Supervised & Reinforcement Learning ?
- What Are The Requirements Of Reinforcement Learning Environment ?
Set-3:
- What Different Targets Do Classification & Regression Algorithm Requires ?
- What Five Popular Algorithms Used In Machine Learning ?
- What Is Confusion Matrix ?
- List The Difference Between KNN & K – Means Clustering .
- What Are Difference Between Type-1 & Type – 2 Error ?
- What Is Semi Supervised Learning ?
- Where Are Semi Supervised Learning Applied ?
- What Is Stemming ?
- What Is Lemmatization ?
- What Is A PCA ?
Set-4:
- What Are Support Vectors In SVM ?
- In terms Of Access How Arrays & Linked Lists Are Different ?
- What Is P – Value ?
- What Techniques Are Used To Find Resemblance In The Recommendation System ?
- What Are Difference Between Regression & Classification ?
- What Does Area Under ROC Curve Indicate ?
- What Is A Neural Network ?
- What Is An Outlier ?
- What Is Another Name Of The Bayesian Network ?
- What Is Ensemble Learning ?
Set-5:
- What Is Clustering ?
- How Would You Define Collinearity ?
- What Is Overfitting ?
- What Is The Bayesian Network ?
- What Is The Time Series ?
- What Is The Dimension Reduction In ML ?
- What Is Underfitting ?
- What Is Sensitivity ?
- What Is Specificity ?
- What Is The Difference Between Stochastic Gradient Descent & Gradient Descent Algorithm ?
Set-6:
- Explain Decision Tree In ML ?
- Why Is Naive Bayes Method Is ‘Naive’ ?
- State The Bayes Theorem For Naive Bayes Algorithm.
- How Would You Define Precision & Recall ?
- What Are Some Tools Used To Discover Outliers ?
- Explain Kernel In SVM .
- What Are Different Types Of Clustering Algorithms ?
- How Would You Describe Reinforcement Learning ?
- What Is Context Based Filtering & Collaborative Filtering ?
- What Is Deductive Learning & Inductive Learning ?
Set-7:
- How Do You Differentiate Data Mining Vs. Machine Learning ?
- Why ROC Curve Is Important ?
- Why Does Overfitting Occurs In ML ?
- What Are Some Functions Of Unsupervised Learning ?
- What Are Some Functions Of Supervised Learning ?
- What Are Two Components Of Bayesian Logic ?
- How Would You Describe A Recommender System ?
- What Is Regularization In ML?
- Advantages & Disadvantages Of Decision Tree ?
- What Do You Understand About Exploding Gradient Problem In Machine Learning ?
Set-7:
- How Do You Differentiate Data Mining Vs. Machine Learning ?
- Why ROC Curve Is Important ?
- Why Does Overfitting Occurs In ML ?
- What Are Some Functions Of Unsupervised Learning ?
- What Are Some Functions Of Supervised Learning ?
- What Are Two Components Of Bayesian Logic ?
- How Would You Describe A Recommender System ?
- What Is Regularization In ML?
- Advantages & Disadvantages Of Decision Tree ?
- What Do You Understand About Exploding Gradient Problem In Machine Learning ?
Set-8:
- What Is Vanishing Gradient Problem In ML ?
- What Do You Understand About Bias & Variance Tradeoff.
- How Would You Describe F1 Score And How Would You Use It ?
- Explain The Difference Between Loss Function & Cost Function ?
- How Would You Handle Outlier Values ?
- What Is A Random Forest & How Does It Works ?
- What Ensemble Techniques Can Be Used To Aggregate Multiple Models ?
- What Methods Can Be Used To Find The Threshold Of A Classifier ?
- How Can You Check Normality Of A Dataset ?
- How Can You Differentiate Between A Parametric & Non Parametric Model ?
Set-9:
- How Can Logistic Regression Can Be Used For More Than One Class ?
- What Difference Exists Between Softmax & Sigmoid Functions ?
- How To Avoid Overfitting In ML Models ?
- Which Is Better To Have A False Positive Or False Negative ?
- How Would You Handle A Dataset Suffering From High Variance ?
- What Are Some Classification Methods That SVM Can Handle ?
- Why Do You Thing Instance Based Learning Algorithm Is Sometimes Referred To As Lazy Learning Algorithm ?
- Explain The Reason For Pruning In Decision Tree ?
- How Regularization Reduces The Cost Term ?
- What Is The Need To Convert Categorical Variables To Factors ?
Set-10:
- Do You Believe Treating A Categorical Variable As A Continuous Variable Will Result In A Better Predictive Model ?
- Why Do We Need The Confusion Matrix ?
- Difference Between Gradient Boosting & Random Forest ?
- How Does Box -Cox Transformation Occur ?
- How Is Data Divided Into Cross Validation ?
- What Are Support Vectors In SVM ?
- What Are Different Method To Split A Tree In Decision Tree Algorithm ?
- How Does Support Vector Machine Algorithm Helps Self – Learning ?
- How To Choose Optimal Number Of Clusters ?
- What Is Feature Engineering ? How Does It Affects Model Performance ?
Set-11:
- Why Do We Perform Normalization ?
- What Is Difference Between Up Sampling & Down Sampling ?
- What Is Data Leakage And How To Identify It ?
- What Are Some Of The Hyperparameters Of The Random Forest Regressor Which Helps To Avoid Overfitting ?
- Is It Always Necessary To Use 80:20 Ratio For The Train Test Split ?
- What Is One – Shot Learning ?
- What Is The Difference Between Manhattan Distance And Euclidean Distance ?
- What Is The Difference Between One Got Encoding & Ordinal Encoding ?
- Explain The Working Principle Of SVM .
- How Random Forest Is Robust To Outliers ?
Set-12:
- How To Handle Data Imbalance In Machine Learning ?
- Does The Accuracy Score Is Always A Good Metric To Measure The Performance Of The Classification Model ?
- What Is KNN Imputer And How Does It Work ?
- Explain The Working Procedure Of The XGBoost Model ?
- What Is Linear Discriminant Analysis ?
- How Can You Visualize High Dimensional Data In 2-D ?
- What Is The Reason Behind The Curse Of Dimensionality ?
- Which Metric Is More Robust To Outlier : MAE , MSE , RMSE ?
- How Would You Access The Goodness Of Fit For A Linear Regression Model ?
- What Is Null Hypothesis In Linear Regression Model ?
Set-13:
- Can SVMs Be Used For Both Classification & Regression Task ?
- Explain The Concept Of Weighting In KNN ? What are the different ways to assign weights, and how do they affect the model’s predictions?
- What is the concept of information gain in decision trees? How does it guide the creation of the tree structure?
- How does the independence assumption affect the accuracy of a Naive Bayes classifier?
- Why does PCA maximize the variance in the data?
- How do you evaluate the effectiveness of a machine learning model in an imbalanced dataset scenario? What metrics would you use instead of accuracy?
- How the One-Class SVM algorithm works for anomaly detection?
- Explain the concept of “concept drift” in anomaly detection.

