Advanced Machine Learning Challenge

1.What are the steps for using a gradient descent algorithm? Calculate error between the actual value and the predicted value Reiterate until you find the best weights of network Pass an input through the network and get values from output layer Initialize random weight and bias Go to each neurons which contributes to the error and change its respective values to reduce the error

Ans: 4, 3, 1, 5, 2

2. Binning is the process of transforming numerical variables into categorical counterparts.

Ans: True

3. What is Decision Tree?

Ans:Flow-Chart & Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label

4. Examples of Naive Bayes Algorithm is/are

Ans: Spam filtration

Sentimental analysis

Classifying articles

5. Additive model for time series Y = . . .

Ans:T+S+C+I

6. For Ridge Regression, if the regularization parameter is very high, which options are true?

Ans:Large coefficients are not penalized

7.If a relevant variable is omitted from a regression equation, the consequences would be that:

Ans:The standard errors would be biased

If the excluded variable is uncorrelated with all of the included variables, all of the slope coefficients will be inconsistent

If the excluded variable is uncorrelated with all of the included variables, the intercept coefficient will be inconsistent

8. Which of the following is incorrect about Hierarchical Clustering?

Ans: It is not similar to the distance phylogenetic tree-building method

9. Which of the following is true about training and testing error in such case?Suppose you want to apply AdaBoost algorithm on Data D which has T observations. You set half the data for training and half for testing initially. Now you want to increase the number of data points for training T1, T2 … Tn where T1 < T2…. Tn-1 < Tn.

Ans:The difference between training error and test error decreases as number of observations increases

10. What is the naive assumption in a Naive Bayes Classifier?

Ans:All the features of a class are independent of each other

11.Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge regression with penalty x. Choose the option which describes bias in best manner.In case of very large x; bias is low

Ans:In case of very large x; bias is low

12.Which of the following techniques perform similar operations as dropout in a neural network?

Ans: Bagging

13.In boosting trees, individual weak learners are independent of each other

Ans: False

14.One of the key ideas for solving PCA with eigenvalue decomposition is that a symmetric matrix can be diagonalized by an orthogonal matrix of its eigenvectors.

Ans: True

15.What happens when model complexity increases?

Ans:Variance of the model increases

16.Misclassification would happen when you use very small C (C~0)

Ans: True

17.Decision tree is a .

Ans:Non-linear ML technique

Supervised Learning technique

18. The logistic model is estimated by way of?

Ans:Maximum likelihood estimation

19.Logistic regression is used when you want to:

Ans:Predict a dichotomous variable from continuous or dichotomous variables.

20.The fundamental unit of network is

Ans: Neuron

21.Support vector machine is used for

Ans:Classification

Regression

Outlier Detection Purposes

22.Multivariate means more than one variable behind the resultant outcome.

Ans: True

23. Which of the following code is used to import one hot encoder?

Ans:from sklearn.preprocessing import OneHotEncoder

24.What is/are the main implementation strategies of collaborative filtering model?

Ans: All of the Above

25. NLP is concerned with the interactions between computers and human (natural) languages.

Ans: True

26.A _________ is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.

Ans:Decision tree

27.Which are the possible situations while using Isolation forest

Ans:When the score of the observation is close to 1, the path length is very small, and then the data point is easily isolated. We have an anomaly.

When the score of that observation is smaller than 0.5, the path length is large, and then we have a normal data point.

If all the observations have an anomaly score around 0.5, then the entire sample doesn’t have any anomaly.

28.Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

Ans:The model would consider only the points close to the hyperplane for modeling

29.Linear models such as linear regression, SVMs with linear kernel, etc follow the linearity principle that two or more variables can be added together so that their sum is also a solution.

Ans: True

30.Agglomerative approach is called as

Ans:Bottom-up Approach

31.If searching among a large number of hyper parameters,you should try values in grid rather than random values,so that you can carry out the search more systematically and not rely on chance.

Ans:False

32. ________technique looks at the meaning of the word.

Ans: Lemmatization

33.Code to import PCA in Scikit-Learn.

Ans:from sklearn.decomposition import PCA

34.Seasonal variation means the variation occurring within:

Ans: Parts of a year

35.A procedure used for finding the equation of a straight line which provides the best approximation for the relationship between the independent and dependent variables is

Ans:the least squares method

36.Which approach uses the memory of previous users interactions to compute users similarities based on items they've interacted ?

Ans:Memory-based

37.From a dirichlet distribution Dir(α) , we draw a random sample representing the topic distribution, or topic mixture, of a particular document.

Ans: True

38.In K-NN, the algorithm used to compute the nearest neighbors:

Ans:Ball Tree

kd tree

brute

39.What do you mean by generalization error in terms of the SVM?

Ans:How accurately the SVM can predict outcomes for unseen data

40.Naive Bayes is :

Ans:Fast to train and fast to use.

41.Hierarchical clustering algorithms suffers from the problem of convergence at local optima?

Ans: True

42.Which of the following will be true about k in k-NN in terms of Bias?

Ans:When you increase the k the bias will be increases

43.What is the advantage of hierarchical clustering over K-means clustering?

Ans:You don't have to assign the number of clusters from the beginning in the case of hierarchical clustering

44.Every hyperparameter if set poorly,can have a huge impact on training and so all hyperparameters are about equally important to tune well.

Ans: False

45. Choose form the following areas where NLP can be useful.

Ans:All of the mentioned

46.Which one of the following statements is TRUE for a Decision Tree?

Ans:In a decision tree, the entropy of a node decreases as we go down a decision tree.

47.Which of the following clustering algorithms suffers from the problem of convergence at local optima?

Ans:K- Means clustering algorithm

Expectation-Maximization clustering algorithm

48.Statement 1: The cost function is altered by adding a penalty equivalent to the square of the magnitude of the coefficients

Statement 2: Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent overfitting which may result from simple linear regression.

Ans:Both Statements are true

49.Which of the following are the pros of Decision Trees?

Ans:Possible Scenarios can be added

Use a white-box model, If a particular result is provided by a model

best, Worst and expected values can be determined for different scenarios

50.Which of the following is correct use of cross validation?

Ans:All of the mentioned

51.For a large k value the k-nearest neighbor modelbecomes _____ and ______ .

Ans:Simple model, Underfit

52. Explainability and interpretability aren't used interchangeably.

Ans:False

53.Overfitting is more likely when there is huge amount of train data.

Ans: False

54.The effectiveness of an SVM depends upon

Ans:Selection of Kernel

Kernel Parameters

55.Which of the following are data types of anomaly detection?

Ans:Outliers

Interquartile range

Spike

Level shift

56.Which of the following algorithm doesn’t uses learning Rate as of one of its hyperparameter?

Ans:Extra Trees

Random Forest

57.For which of the following problems would anomaly detection be a suitable algorithm?

Ans:From a large set of primary care patient records, identify individuals who might have unusual health conditions.

Given a dataset of credit card transactions, identify unusual transactions to flag them as possibly fraudulent.

58.________is an algorithm used for continuous target variables that are used for regression problems in Decision Tree.

Ans:Reduction in Variance

59.PCA is technique for _______.

Ans:Dimensionality Reduction

Feature Extraction

60.The normal distribution is a probability distribution over all the real numbers.

Ans: True

61.Which of the following statement(s) can be true post adding a variable in a linear regression model?

Ans:R-Squared and Adjusted R-squared both increase

62.What would be then consequences for the OLS estimator if heteroscedasticity is present in a regression model but ignored?

Ans:It will be inefficient

63.Select the appropriate option which describes the Single Linkage method.

Ans:In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between two points in each cluster.

64.Large values of the log-likelihood statistic indicate:

Ans:That the statistical model is a poor fit of the data.

65.Tree Interpreters is an example of

Ans:Model- Specific Approach

66.How we can avoid the overfitting in Decision Tree

Ans:Both of above

67.Discriminative models :

Ans:Estimate parameters directly from training data.

68. Which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most?

Ans:The polynomial degree

69.In Ridge regression, A hyper parameter is used called “_____________” that controls the weighting of the penalty to the loss function.

Ans: Lambda

70.Suppose you have trained an anomaly detection system for fraud detection, and your system that flags anomalies when p(x) is less than ε, and you find on the cross-validation set that it is missing many fradulent transactions (i.e., failing to flag them as anomalies). What should you do?

Ans:Increase ε

71.Which of the following is required by K-means clustering?

Ans:all of the mentioned

72.Model-based CF algorithm is/are___________

Ans:All of The Above

73.The process of learning, recognizing, and extracting these topics across a collection of documents is called

Ans:Topic Modeling

74.Which of the following are advantages of PCA?

Ans:Removes redundant features and noise

Less storage space required

75.Logistic Regression is a ______ regression technique that is used to model data having a _______ outcome.

Ans:Non-linear , binary

76.Why do we need biological neural networks?

Ans:all of the mentioned

77.Which of the following assumptions are required to show the consistency, unbiasedness and efficiency of the OLS estimator?

Ans:E(ut) = 0

Var(ut) = σ2

Cov(ut, ut-j) = 0 ∀ j

78.Assume, you want to cluster 7 observations into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations: C1: {(2,2), (4,4), (6,6)} C2: {(0,4), (4,0)} C3: {(5,5), (9,9)} What will be the cluster centroids if you want to proceed for second iteration?

Ans:C1: (4,4), C2: (2,2), C3: (7,7)

79.Spam email detection comes under which domain?

Ans:Text Classification

80.What is/are the primary goals of Feature Engineering?

Ans:All of the above

81.LDA is a

Ans:Generative probabilistic process

82.Which of the following options are true?

Ans:You don’t need to initialize parameters in PCA

PCA can’t be trapped into local minima problem

83.Full Form of LIME

Ans:Local Interpretable Model-Agnostic Explanations

84.In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?

Ans:p(x/c)

85.What are the possible constituent models of a hybrid recommender system (check all that apply)?

Ans:Collaborative Filtering

Content-Based Filtering

86.Which of the following options is/are true for K-fold cross-validation? 1.)Increase in K will result in higher time required to cross validate the result. 2.)Higher values of K will result in higher confidence on the cross-validation result as compared to lower value of K. 3.)If K=N, then it is called Leave one out cross validation, where N is the number of observations.

Ans:1,2,3

87.A document is a distribution over words.

Ans:False

88.What is back propagation?

Ans:It is the transmission of error back through the network to allow weights to be adjusted so that the network can learn

89.The “elkan” variation of k-means is more efficient on data with well-defined clusters, by using the triangle inequality. However it’s more memory intensive due to the allocation of an extra array of shape (n_samples, n_clusters).

Ans: True

90.Logistic regression assumes a:

Ans:Linear relationship between continuous predictor variables and the logit of the outcome variable.

91.Which of the following statement is TRUE?

Ans:The nature of our business problem determines how outliers are used.

92.In the measurement of the secular trend, the moving averages:

Ans:Smooth out the time series

93.In this Lasso and Ridge regression as alpha value increases, the slope of the regression line reduces and becomes horizontal.

Ans:True

94.Regarding bias and variance, which of the following statements are true? (Here ‘high’ and ‘low’ are relative to the ideal model.)

Ans:Models which overfit have a low bias.

Models which underfit have a low variance

95.which of the followings statements are True related to k-NN algorithm

Ans:K-NN is a non-parametric algorithm

It is also called a lazy learner algorithm

It is robust to the noisy training data

96.A rise in prices before Eid is an example of

Ans:Seasonal Trend

97.Which of the following can act as possible termination conditions in K-Means?

Ans:For a fixed number of iterations.

Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.

Centroids do not change between successive iterations.

Terminate when RSS falls below a threshold.

98.We use a validation and test set to avoid the bias and variance.

Ans:True

99.Which of the following method is used for trainControl resampling?

Ans:repeatedcv

100.Which of the following will be Euclidean Distance between the two data point A(1,3) and B(2,3)?

Ans: 1

101.Which data mining is most suitable for categorical variables

Ans:Decision tree

102.The fundamental unit of network is

Ans: Neuron

103.It is possible that Assignment of observations to clusters does not change between successive iterations in K-Means

Ans:True

104.Sudden transaction of huge amount from a credit card. This falls into which category of anomaly?

Ans:Point anomaly

105.Suppose the model is demonstrating the high variance across different training sets. Which of the following is not a valid way to reduce the variance?

Ans:Increase the optimization algorithm being used for error minimisation

106.Modern NLP algorithms are based on machine learning, especially statistical machine learning.

Ans:True

107.A typical example of Memory-based approach is User Neighbourhood-based CF.

Ans:True

108.Which of the following is attribute selection method in spliting the Decision Tree?

Ans:Information Gain

Gini index

Gain ratio

109.Why are SVMs fast?

Ans:Quadratic optimization (convex!)

They work in the dual, with relatively few points

The kernel trick

110.A set of observations recorded at an equal interval of time is called

Ans:Time series data

111.Which of the following are true about isolation forest?

Ans:Identifies anomalies as the observations with short average path lengths

Isolation forest is built based on ensembles of decision trees.

Isolation forest needs an anomaly Score to have an idea of how anomalous a data point is

Splits the data points by randomly selecting a value between the maximum and the minimum of the selected features.

112.What types of error does bias cause in a model?

Ans:Over Generalization

Underfitting

113.Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering? 1.Single-link 2.Complete-link 3. Average-link

Ans:1,2 and 3

114.Agglomerative approach is called as Top-Down Approach whereas Divisive Approach is called as Bottom-Up Approach.

Ans:False

115.The amount of output of one unit received by another unit depends on what?

Ans:weight

116.Which of the following statements about Naive Bayes are correct?

Ans:attributes are equally important.

attributes are statistically independent of one another given the class value.

attributes can be nominal or numeric

117.Which of the following is true about regularized regression?

Ans:Can help with bias trade-off

118.The SVM's are more effective when

Ans:The data is linearly separable

The data is clean and ready to use

119.The log likelihood is parallel to?

Ans:The F-test in OLS regression

120.Standardisation of features is required before training a logistic regression.

Ans:False

121.Which of the following includes major tasks of NLP?

Ans:All of the mentioned

122.Content-Based Filtering depends only on the user previous choices, making this method robust to avoid the cold-start problem.

Ans:True

123.A time series data is a set of data recorded at

Ans:All the above

124.Which of the following distance metric can be used in k-NN?

Ans:Manhattan

Minkowski

Tanimoto

Jaccard

125.The hierarchical clustering method generates a similarity score [S(X,Y)] for all gene combinations, places the scores in a matrix, joins those genes that have the highest score, and then continues to join progressively less similar pairs.

Ans:True

126.Which of the following is the second goal of PCA?

Ans:Data Compression

127.Association rule approach is taken by decision tree for knowledge learning.

Ans:False

128.Which of the following is not an assumption of Linear Regression?

Ans:Multicollinearity

129.Topic modelling refers to the task of identifying documents that best describes a set of topics.

Ans:False

130.Support vectors are the data points that lie farthest to the decision surface.

Ans:False

131.Finding good hyperparameter is a time consuming process So typically you should do it once in the beginning of the project and try to find best yperparameter do that you dont have to visit tuning them again.

Ans:False

132.Disadvantages of Naive Bayes Classifier:

Ans:Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features.

133.XGBoost as well as Catboost doesn’t have an inbuilt method for categorical features. Encoding (one-hot, target encoding, etc.) should be performed by the user.

Ans:False

134.Select the appropriate option which describes the Single Linkage method.

Ans:In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between two points in each cluster.

135.Which of the following methods do we use to find the best fit line for data in Linear Regression?

Ans:Least Square Error

136.Irreducible error can be removed by increasing the model complexity.

Ans:False

137.The time series analysis helps:

Ans:All the above

138.What is the right order for a text classification model components 1. Text cleaning 2. Text annotation 3. Gradient descent 4. Model tuning 5. Text to predictors

Ans:12534

139.Why ML Explainability is important?

Ans:All of the Above

140.In k-NN what will happen when you increase/decrease the value of k?

Ans:The boundary becomes smoother with increasing value of K

141.Jaccard distance is a metric for comparing two binary data strings. While comparing two binary strings of equal length, Jaccard distance is the number of bit positions in which the two bits are different.

Ans:False

142.The curse of dimensionality refers to all the problems that arise working with data in the higher dimensions.

Ans:True

143.k-means clustering is not a method of vector quantization

Ans:False

144.Which of the following are true? Check all that apply

Ans:If you do not have any labeled data (or if all your data has label y = 0), then is is still possible to learn p(x), but it may be harder to evaluate the system or choose a good value of ϵ.

When choosing features for an anomaly detection system, it is a good idea to look for features that take on unusually large or small values for (mainly the) anomalous examples.

145.Which of the following is a Latent Factor Model?

Ans:Singular Value Decomposition

146.For K-cross validation ,smaller k implies less variance.

Ans:True

147.Which of the following statement is NOT TRUE about k-means?

Ans:Number of clusters to be built is typically an user input and it impacts the way clusters are created

148.How do you improve random forest accuracy?

Ans:Algorithm Tuning

Add more data

Feature Selection

149.PCA works better If the data lies on a curved surface and not on a flat surface.

Ans:False

150.What would be the relation between the time taken by 1-NN,2-NN,3-NN.

Ans:1-NN ~ 2-NN ~ 3-NN

151.What is the field of Natural Language Processing (NLP)?

Ans:All of the mentioned

152.Which regularization is used to reduce the over fit problem?

Ans:Both

153.PCA is mostly used for ______________.

Ans:Unsupervised Learning

154.Random Forest learns a coefficient for each input feature, which shows how much this feature influences the target feature. True/False?

Ans:False

155.What does "Naive" in naive classifier refers to?

Ans:Strong independence assumption between the features/variable.

156.Logarthmic tranformation helps to handle skewed data and after transformation, the distribution becomes more approximate to normal.

Ans:True

157.Statement 1: The cost function is altered by adding a penalty equivalent to the square of the magnitude of the coefficients

Statement 2: Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent overfitting which may result from simple linear regression.

Ans:Both Statements are true

158.What will happen if you don’t rotate the components?

Ans:PCA will diminish

159.The component of a time series which is attached to short term variation is:

Ans:Seasonal variation

160.In SVM, if the number of input features is 3, then the hyperplane is a.

Ans:Plane

161.LDA is a

Ans:Generative probabilistic process

162.Full form of SHAP

Ans:Shapley Additive Explanations

163.Which of the following is a disadvantage of decision trees?

Ans:Decision trees are prone to be overfit

164.In logistic regression, what do we estimate for one each unit’s change in X?

Ans:How much the natural logarithm of the odds for Y = 1 changes

165.Feature engineering is a process of transforming the given data into a form which is easier to interpret.

Ans:True

166.A single document invokes multiple topics.

Ans:True

167.Which of the following is true when you choose fraction of observations for building the base learners in tree based algorithm?

Ans:Decrease the fraction of samples to build a base learners will result in decrease in variance

168.Which of the following neural networks uses supervised learning? (A) Multilayer perceptron (B) Self organizing feature map (C) Hopfield network

Ans:(A) only

Search This Blog

EduAllRounder

Advanced Machine Learning Challenge

Comments

Post a Comment

Popular posts from this blog

DCCB Admit card 2021 out | APCOB ADMIT CARDS OUT | EduAllRounder