Advanced Machine Learning Challenge
Advanced Machine Learning Challenge
1.What are the steps for using a gradient descent algorithm? Calculate error between the actual value and the predicted value Reiterate until you find the best weights of network Pass an input through the network and get values from output layer Initialize random weight and bias Go to each neurons which contributes to the error and change its respective values to reduce the error
Ans: 4, 3, 1, 5, 2
2. Binning is the process of transforming numerical variables into categorical counterparts.
Ans: True
3. What is Decision Tree?
Ans:Flow-Chart & Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
4. Examples of Naive Bayes Algorithm is/are
Ans: Spam filtration
Sentimental analysis
Classifying articles
5. Additive model for time series Y = . . .
Ans:T+S+C+I
6. For Ridge Regression, if the regularization parameter is very high, which options are true?
Ans:Large coefficients are not penalized
7.If a relevant variable is omitted from a regression equation, the consequences would be that:
Ans:The standard errors would be biased
If the excluded variable is uncorrelated with all of the included variables, all of the slope coefficients will be inconsistent
If the excluded variable is uncorrelated with all of the included variables, the intercept coefficient will be inconsistent
8. Which of the following is incorrect about Hierarchical Clustering?
Ans: It is not similar to the distance phylogenetic tree-building method
9. Which of the following is true about training and testing error in such case?Suppose you want to apply AdaBoost algorithm on Data D which has T observations. You set half the data for training and half for testing initially. Now you want to increase the number of data points for training T1, T2 … Tn where T1 < T2…. Tn-1 < Tn.
Ans:The difference between training error and test error decreases as number of observations increases
10. What is the naive assumption in a Naive Bayes Classifier?
Ans:All the features of a class are independent of each other
11.Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge regression with penalty x. Choose the option which describes bias in best manner.In case of very large x; bias is low
Ans:In case of very large x; bias is low
12.Which of the following techniques perform similar operations as dropout in a neural network?
Ans: Bagging
13.In boosting trees, individual weak learners are independent of each other
Ans: False
14.One of the key ideas for solving PCA with eigenvalue decomposition is that a symmetric matrix can be diagonalized by an orthogonal matrix of its eigenvectors.
Ans: True
15.What happens when model complexity increases?
Ans:Variance of the model increases
16.Misclassification would happen when you use very small C (C~0)
Ans: True
17.Decision tree is a .
Ans:Non-linear ML technique
Supervised Learning technique
18. The logistic model is estimated by way of?
Ans:Maximum likelihood estimation
19.Logistic regression is used when you want to:
Ans:Predict a dichotomous variable from continuous or dichotomous variables.
20.The fundamental unit of network is
Ans: Neuron
21.Support vector machine is used for
Ans:Classification
Regression
Outlier Detection Purposes
22.Multivariate means more than one variable behind the resultant outcome.
Ans: True
23. Which of the following code is used to import one hot encoder?
Ans:from sklearn.preprocessing import OneHotEncoder
24.What is/are the main implementation strategies of collaborative filtering model?
Ans: All of the Above
25. NLP is concerned with the interactions between computers and human (natural) languages.
Ans: True
26.A _________ is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
Ans:Decision tree
27.Which are the possible situations while using Isolation forest
Ans:When the score of the observation is close to 1, the path length is very small, and then the data point is easily isolated. We have an anomaly.
When the score of that observation is smaller than 0.5, the path length is large, and then we have a normal data point.
If all the observations have an anomaly score around 0.5, then the entire sample doesn’t have any anomaly.
28.Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?
Ans:The model would consider only the points close to the hyperplane for modeling
29.Linear models such as linear regression, SVMs with linear kernel, etc follow the linearity principle that two or more variables can be added together so that their sum is also a solution.
Ans: True
30.Agglomerative approach is called as
Ans:Bottom-up Approach
31.If searching among a large number of hyper parameters,you should try values in grid rather than random values,so that you can carry out the search more systematically and not rely on chance.
Ans:False
32. ________technique looks at the meaning of the word.
Ans: Lemmatization
33.Code to import PCA in Scikit-Learn.
Ans:from sklearn.decomposition import PCA
34.Seasonal variation means the variation occurring within:
Ans: Parts of a year
35.A procedure used for finding the equation of a straight line which provides the best approximation for the relationship between the independent and dependent variables is
Ans:the least squares method
36.Which approach uses the memory of previous users interactions to compute users similarities based on items they've interacted ?
Ans:Memory-based
37.From a dirichlet distribution Dir(α) , we draw a random sample representing the topic distribution, or topic mixture, of a particular document.
Ans: True
38.In K-NN, the algorithm used to compute the nearest neighbors:
Ans:Ball Tree
kd tree
brute
39.What do you mean by generalization error in terms of the SVM?
Ans:How accurately the SVM can predict outcomes for unseen data
40.Naive Bayes is :
Ans:Fast to train and fast to use.
41.Hierarchical clustering algorithms suffers from the problem of convergence at local optima?
Ans: True
42.Which of the following will be true about k in k-NN in terms of Bias?
Ans:When you increase the k the bias will be increases
43.What is the advantage of hierarchical clustering over K-means clustering?
Ans:You don't have to assign the number of clusters from the beginning in the case of hierarchical clustering
44.Every hyperparameter if set poorly,can have a huge impact on training and so all hyperparameters are about equally important to tune well.
Ans: False
45. Choose form the following areas where NLP can be useful.
Ans:All of the mentioned
46.Which one of the following statements is TRUE for a Decision Tree?
Ans:In a decision tree, the entropy of a node decreases as we go down a decision tree.
47.Which of the following clustering algorithms suffers from the problem of convergence at local optima?
Ans:K- Means clustering algorithm
Expectation-Maximization clustering algorithm
48.Statement 1: The cost function is altered by adding a penalty equivalent to the square of the magnitude of the coefficients
Statement 2: Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent overfitting which may result from simple linear regression.
Ans:Both Statements are true
49.Which of the following are the pros of Decision Trees?
Ans:Possible Scenarios can be added
Use a white-box model, If a particular result is provided by a model
best, Worst and expected values can be determined for different scenarios
50.Which of the following is correct use of cross validation?
Ans:All of the mentioned
51.For a large k value the k-nearest neighbor modelbecomes _____ and ______ .
Ans:Simple model, Underfit
52. Explainability and interpretability aren't used interchangeably.
Ans:False
53.Overfitting is more likely when there is huge amount of train data.
Ans: False
54.The effectiveness of an SVM depends upon
Ans:Selection of Kernel
Kernel Parameters
55.Which of the following are data types of anomaly detection?
Ans:Outliers
Interquartile range
Spike
Level shift
56.Which of the following algorithm doesn’t uses learning Rate as of one of its hyperparameter?
Ans:Extra Trees
Random Forest
57.For which of the following problems would anomaly detection be a suitable algorithm?
Ans:From a large set of primary care patient records, identify individuals who might have unusual health conditions.
Given a dataset of credit card transactions, identify unusual transactions to flag them as possibly fraudulent.
58.________is an algorithm used for continuous target variables that are used for regression problems in Decision Tree.
Ans:Reduction in Variance
59.PCA is technique for _______.
Ans:Dimensionality Reduction
Feature Extraction
60.The normal distribution is a probability distribution over all the real numbers.
Ans: True
61.Which of the following statement(s) can be true post adding a variable in a linear regression model?
Ans:R-Squared and Adjusted R-squared both increase
62.What would be then consequences for the OLS estimator if heteroscedasticity is present in a regression model but ignored?
Ans:It will be inefficient
63.Select the appropriate option which describes the Single Linkage method.
Ans:In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between two points in each cluster.
64.Large values of the log-likelihood statistic indicate:
Ans:That the statistical model is a poor fit of the data.
65.Tree Interpreters is an example of
Ans:Model- Specific Approach
66.How we can avoid the overfitting in Decision Tree
Ans:Both of above
67.Discriminative models :
Ans:Estimate parameters directly from training data.
68. Which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most?
Ans:The polynomial degree
69.In Ridge regression, A hyper parameter is used called “_____________” that controls the weighting of the penalty to the loss function.
Ans: Lambda
70.Suppose you have trained an anomaly detection system for fraud detection, and your system that flags anomalies when p(x) is less than ε, and you find on the cross-validation set that it is missing many fradulent transactions (i.e., failing to flag them as anomalies). What should you do?
Ans:Increase ε
71.Which of the following is required by K-means clustering?
Ans:all of the mentioned
72.Model-based CF algorithm is/are___________
Ans:All of The Above
73.The process of learning, recognizing, and extracting these topics across a collection of documents is called
Ans:Topic Modeling
74.Which of the following are advantages of PCA?
Ans:Removes redundant features and noise
Less storage space required
75.Logistic Regression is a ______ regression technique that is used to model data having a _______ outcome.
Ans:Non-linear , binary
76.Why do we need biological neural networks?
Ans:all of the mentioned
77.Which of the following assumptions are required to show the consistency, unbiasedness and efficiency of the OLS estimator?
Ans:E(ut) = 0
Var(ut) = σ2
Cov(ut, ut-j) = 0 ∀ j
78.Assume, you want to cluster 7 observations into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations: C1: {(2,2), (4,4), (6,6)} C2: {(0,4), (4,0)} C3: {(5,5), (9,9)} What will be the cluster centroids if you want to proceed for second iteration?
Ans:C1: (4,4), C2: (2,2), C3: (7,7)
79.Spam email detection comes under which domain?
Ans:Text Classification
80.What is/are the primary goals of Feature Engineering?
Ans:All of the above
81.LDA is a
Ans:Generative probabilistic process
82.Which of the following options are true?
Ans:You don’t need to initialize parameters in PCA
PCA can’t be trapped into local minima problem
83.Full Form of LIME
Ans:Local Interpretable Model-Agnostic Explanations
84.In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?
Ans:p(x/c)
85.What are the possible constituent models of a hybrid recommender system (check all that apply)?
Ans:Collaborative Filtering
Content-Based Filtering
86.Which of the following options is/are true for K-fold cross-validation? 1.)Increase in K will result in higher time required to cross validate the result. 2.)Higher values of K will result in higher confidence on the cross-validation result as compared to lower value of K. 3.)If K=N, then it is called Leave one out cross validation, where N is the number of observations.
Ans:1,2,3
87.A document is a distribution over words.
Ans:False
88.What is back propagation?
Ans:It is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
89.The “elkan” variation of k-means is more efficient on data with well-defined clusters, by using the triangle inequality. However it’s more memory intensive due to the allocation of an extra array of shape (n_samples, n_clusters).
Ans: True
90.Logistic regression assumes a:
Ans:Linear relationship between continuous predictor variables and the logit of the outcome variable.
91.Which of the following statement is TRUE?
Ans:The nature of our business problem determines how outliers are used.
92.In the measurement of the secular trend, the moving averages:
Ans:Smooth out the time series
93.In this Lasso and Ridge regression as alpha value increases, the slope of the regression line reduces and becomes horizontal.
Ans:True
94.Regarding bias and variance, which of the following statements are true? (Here ‘high’ and ‘low’ are relative to the ideal model.)
Ans:Models which overfit have a low bias.
Models which underfit have a low variance
95.which of the followings statements are True related to k-NN algorithm
Ans:K-NN is a non-parametric algorithm
It is also called a lazy learner algorithm
It is robust to the noisy training data
96.A rise in prices before Eid is an example of
Ans:Seasonal Trend
97.Which of the following can act as possible termination conditions in K-Means?
Ans:For a fixed number of iterations.
Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
Centroids do not change between successive iterations.
Terminate when RSS falls below a threshold.
98.We use a validation and test set to avoid the bias and variance.
Ans:True
99.Which of the following method is used for trainControl resampling?
Ans:repeatedcv
100.Which of the following will be Euclidean Distance between the two data point A(1,3) and B(2,3)?
Ans: 1
101.Which data mining is most suitable for categorical variables
Ans:Decision tree
102.The fundamental unit of network is
Ans: Neuron
103.It is possible that Assignment of observations to clusters does not change between successive iterations in K-Means
Ans:True
104.Sudden transaction of huge amount from a credit card. This falls into which category of anomaly?
Ans:Point anomaly
105.Suppose the model is demonstrating the high variance across different training sets. Which of the following is not a valid way to reduce the variance?
Ans:Increase the optimization algorithm being used for error minimisation
106.Modern NLP algorithms are based on machine learning, especially statistical machine learning.
Ans:True
107.A typical example of Memory-based approach is User Neighbourhood-based CF.
Ans:True
108.Which of the following is attribute selection method in spliting the Decision Tree?
Ans:Information Gain
Gini index
Gain ratio
109.Why are SVMs fast?
Ans:Quadratic optimization (convex!)
They work in the dual, with relatively few points
The kernel trick
110.A set of observations recorded at an equal interval of time is called
Ans:Time series data
111.Which of the following are true about isolation forest?
Ans:Identifies anomalies as the observations with short average path lengths
Isolation forest is built based on ensembles of decision trees.
Isolation forest needs an anomaly Score to have an idea of how anomalous a data point is
Splits the data points by randomly selecting a value between the maximum and the minimum of the selected features.
112.What types of error does bias cause in a model?
Ans:Over Generalization
Underfitting
113.Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering? 1.Single-link 2.Complete-link 3. Average-link
Ans:1,2 and 3
114.Agglomerative approach is called as Top-Down Approach whereas Divisive Approach is called as Bottom-Up Approach.
Ans:False
115.The amount of output of one unit received by another unit depends on what?
Ans:weight
116.Which of the following statements about Naive Bayes are correct?
Ans:attributes are equally important.
attributes are statistically independent of one another given the class value.
attributes can be nominal or numeric
117.Which of the following is true about regularized regression?
Ans:Can help with bias trade-off
118.The SVM's are more effective when
Ans:The data is linearly separable
The data is clean and ready to use
119.The log likelihood is parallel to?
Ans:The F-test in OLS regression
120.Standardisation of features is required before training a logistic regression.
Ans:False
121.Which of the following includes major tasks of NLP?
Ans:All of the mentioned
122.Content-Based Filtering depends only on the user previous choices, making this method robust to avoid the cold-start problem.
Ans:True
123.A time series data is a set of data recorded at
Ans:All the above
124.Which of the following distance metric can be used in k-NN?
Ans:Manhattan
Minkowski
Tanimoto
Jaccard
125.The hierarchical clustering method generates a similarity score [S(X,Y)] for all gene combinations, places the scores in a matrix, joins those genes that have the highest score, and then continues to join progressively less similar pairs.
Ans:True
126.Which of the following is the second goal of PCA?
Ans:Data Compression
127.Association rule approach is taken by decision tree for knowledge learning.
Ans:False
128.Which of the following is not an assumption of Linear Regression?
Ans:Multicollinearity
129.Topic modelling refers to the task of identifying documents that best describes a set of topics.
Ans:False
130.Support vectors are the data points that lie farthest to the decision surface.
Ans:False
131.Finding good hyperparameter is a time consuming process So typically you should do it once in the beginning of the project and try to find best yperparameter do that you dont have to visit tuning them again.
Ans:False
132.Disadvantages of Naive Bayes Classifier:
Ans:Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features.
133.XGBoost as well as Catboost doesn’t have an inbuilt method for categorical features. Encoding (one-hot, target encoding, etc.) should be performed by the user.
Ans:False
134.Select the appropriate option which describes the Single Linkage method.
Ans:In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between two points in each cluster.
135.Which of the following methods do we use to find the best fit line for data in Linear Regression?
Ans:Least Square Error
136.Irreducible error can be removed by increasing the model complexity.
Ans:False
137.The time series analysis helps:
Ans:All the above
138.What is the right order for a text classification model components 1. Text cleaning 2. Text annotation 3. Gradient descent 4. Model tuning 5. Text to predictors
Ans:12534
139.Why ML Explainability is important?
Ans:All of the Above
140.In k-NN what will happen when you increase/decrease the value of k?
Ans:The boundary becomes smoother with increasing value of K
141.Jaccard distance is a metric for comparing two binary data strings. While comparing two binary strings of equal length, Jaccard distance is the number of bit positions in which the two bits are different.
Ans:False
142.The curse of dimensionality refers to all the problems that arise working with data in the higher dimensions.
Ans:True
143.k-means clustering is not a method of vector quantization
Ans:False
144.Which of the following are true? Check all that apply
Ans:If you do not have any labeled data (or if all your data has label y = 0), then is is still possible to learn p(x), but it may be harder to evaluate the system or choose a good value of ϵ.
When choosing features for an anomaly detection system, it is a good idea to look for features that take on unusually large or small values for (mainly the) anomalous examples.
145.Which of the following is a Latent Factor Model?
Ans:Singular Value Decomposition
146.For K-cross validation ,smaller k implies less variance.
Ans:True
147.Which of the following statement is NOT TRUE about k-means?
Ans:Number of clusters to be built is typically an user input and it impacts the way clusters are created
148.How do you improve random forest accuracy?
Ans:Algorithm Tuning
Add more data
Feature Selection
149.PCA works better If the data lies on a curved surface and not on a flat surface.
Ans:False
150.What would be the relation between the time taken by 1-NN,2-NN,3-NN.
Ans:1-NN ~ 2-NN ~ 3-NN
151.What is the field of Natural Language Processing (NLP)?
Ans:All of the mentioned
152.Which regularization is used to reduce the over fit problem?
Ans:Both
153.PCA is mostly used for ______________.
Ans:Unsupervised Learning
154.Random Forest learns a coefficient for each input feature, which shows how much this feature influences the target feature. True/False?
Ans:False
155.What does "Naive" in naive classifier refers to?
Ans:Strong independence assumption between the features/variable.
156.Logarthmic tranformation helps to handle skewed data and after transformation, the distribution becomes more approximate to normal.
Ans:True
157.Statement 1: The cost function is altered by adding a penalty equivalent to the square of the magnitude of the coefficients
Statement 2: Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent overfitting which may result from simple linear regression.
Ans:Both Statements are true
158.What will happen if you don’t rotate the components?
Ans:PCA will diminish
159.The component of a time series which is attached to short term variation is:
Ans:Seasonal variation
160.In SVM, if the number of input features is 3, then the hyperplane is a.
Ans:Plane
161.LDA is a
Ans:Generative probabilistic process
162.Full form of SHAP
Ans:Shapley Additive Explanations
163.Which of the following is a disadvantage of decision trees?
Ans:Decision trees are prone to be overfit
164.In logistic regression, what do we estimate for one each unit’s change in X?
Ans:How much the natural logarithm of the odds for Y = 1 changes
165.Feature engineering is a process of transforming the given data into a form which is easier to interpret.
Ans:True
166.A single document invokes multiple topics.
Ans:True
167.Which of the following is true when you choose fraction of observations for building the base learners in tree based algorithm?
Ans:Decrease the fraction of samples to build a base learners will result in decrease in variance
168.Which of the following neural networks uses supervised learning? (A) Multilayer perceptron (B) Self organizing feature map (C) Hopfield network
Ans:(A) only
Comments
Post a Comment