Question 1

Consider the problem of classifying whether the given fruit is a muskmelon (Label:1) or a mango (Label:0). The following table shows the weight values collected for 5 muskmelons and 5 mangos. Choose which of the following threshold value solves this problem with zero classification error?A) 2

B) 2.5

C) 3

D) Doesn't exist

Answer: D

Question 2

Assume that a student blindly (i.e., without looking for any specific genre) gathers tons of e-books from his friends over months and store them in a single directory. After about machine learning algorithms, she decided to group the books into multiple categories by reading the contents of the books one by one automatically. Which of the following algorithm is more suitable for her?

A) Supervised classification algorithms

B) Unsupervised clustering algorithms

C) Supervised regression algorithms

D) No ML algorithm is suitable to solve this problem

Answer: B

Question

Common Data for the questions 3 to 7

A sugar patient records his daily blood glucose levels in the morning before taking a breakfast. He continued to do that for an year. The table below shows a few samples of the dataset with food items he had on the previous nights. He wanted to predict his daily glucose level using this data

Question 3

Which of the following machine learning approach is more suitable to satisfy his objective?

A) Classification

B) Regression

C) Clustering

D) Recommendation

Answer: B

Question 4

The dimension( d in R^d ) of the feature space x is?

A) 1

B) 2

C) 3

D) 4

Answer: D

Question 5

The dimension( d in R^d ) of the label space y is?

A) 1

B) 2

C) 3

D) 4

Answer: A

Question 6

Suppose the trained model f(x) is given as 5.8x₁+ 3.4x₂ + 20.9x₃ + 1.2x₄ + 79.8. One fine day, he had three idli, two dosa, one icecrearn and continued working beyond 11 p.m. to submit the Machine Learning assignment.What will be his glucose level the next day as predicted by the model?

A) 120.1

B) 123.5

C) 125.9

D) 129.6

Answer: B

Question 7

After noting the predicted value by the model, he decided to measure the glucose level using a Glucometer. The meter reads 120 mg/dL, then what is the squared error between the prediction and the actual value?

A) 05.5

B) 10.8

C) 12.5

D) 16.8

Answer: C

Question

Common data for the questions 8 and 9

Consider a function f(w) = 0.1w² + w - 1. Suppose that we use a gradient descent algorithm to find the minima of the function iteratively.

Question 8

With the initial guess for the w to be w=10 and taking the step size Î·= 0.5, the value of w after the third iteration is?

A) 5.8

B) 7.6

C) 8.3

D) 9.5

Answer: A

Question 9

How far the current weight value (after 3 iterations) from the minima of the function?

A) 8

B) 10

C) 12

D) 14

Answer: B

Question 10

Consider the following statements point

A) Neural networks can separate non-linear data points (i.e., data points which can't be separated by a line)

B) It uses back-propagation algorithm to iteratively update the weights (parameters) of the network.

Select the correct option about these statements

A) Only A is true

B) Only B is true

C) Both A and B are true

D) Neither A nor B is true

Answer: C

ó „›᠋Question

Common Data for questions 11 to 13

The data points (x∈R) in the table below are to be grouped into 2 clusters, namely, (z₁ , z₂). Each data point is randomly assigned to a cluster as shown in the table, what is the mean (average) value of z₁ and z₂?

Question 11

The table shows one way of grouping data points into 2 clusters. In how many ways the samples can be grouped into two clusters? (Note: A cluster can be empty (i.e., no samples assigned to it)

A) 1000

B) 1024

C) 1200

D) 1250

Answer: B

Question 12

The mean value of z₁:

A) 1

B) 2

C) 3

D) 4

Answer: C

Question 13

The mean value of z₂:

A) 1

B) 2.2

C) 3

D) 4

Answer: B

Question 14

Which of the following algorithm helps reducing the dimension of data points from R^d to R^k , such that k << d ?

A) K-means clustering

B) KNN classifier

C) principal Component Analysis (PCA)

D) Perceptron

Answer: C

Question 15

Consider a column vector x of size 10 x 1. Then the product of xx^T will be a?

A) scalar

B) vector

C) matrix

D) not possible

Answer: C

Question 16

Which of the following are a binary classification problem?(more than one correct)

A) Categorize whether a tweet in twitter is violent or nonviolent

B) Predict whether an upcoming movie will be successful or not

C) Recognize whether a digit in an image is zero or nine

D) Given a speech waveform (i.e, audio), identify whether the speaker is a male or female

Answer: All options correct

Question 17

Consider a problem of classifying whether a given fruit is a Muskmelon (Label:1) or a Mango (Label:0). The following table shows the weight values collected for 5 Muskmelons and 5 Mangoes. Suppose that two K-NN classifiers,namely, C₁ and C₂ are used for classification. Assume that k =1 for classifier C₁ and k= 3 for classifier C₂. Further, both the classifiers use the following distance formula d = |x^t - x^i| where i = 1, 2, 3,... 10. Then the classifiers classify the given test point x^t=2.5 as (O₁ ,O₂), where O₁ is classification by C₁ and O₂ is classification by C₂

A) 1,1

B) 1,0

C) 0,1

D) 0,0

Answer: A

Question 18

use decision tree classification algorithm by setting the threshold for the feature x < 2.5 in the root node. What is the information gain after the first split (i.e.,level-1 of the tree)?. Let log₂ (0) be defined as zero

A) 0.41

B) 1.9

C) 2.3

D) 2.9

Answer: A

Question 19

Consider the following statements

A. Discriminative models learns a probability mapping from feature x to label y.

B. Generative models a joint probability distribution of feature x and label y

C. Discriminative models do not learn about a probability distribution from which the features are drawn.

Which of these statements are True?

A) A and B

B) only A

C) All the statements: A,B and C

D) B and C

Answer: C

Question 20

Given the learned weight vector w, select the statement that relates the line that separates the data points (i.e.,decision line) and the weight vector?

A) The angle between w and decision line is not 90°

B) It is always parallel to w

C) It can be orientated in any direction with respect to w based on the data points

D) It is always perpendicular to w

Answer: D

Question 21

The Sigmoid function is a probabilistic function and hence the output is always between (0,1). The statement is

A) true

B) false

Answer: A

Question 22

Suppose the Perceptron model is used for a binary classification problem. Suppose further that all the data points in the data set are correctly classified for some w.Then using the Perceptron learning algorithm to update the w will no longer modify the elements in the w. The statement is

A) Always False

B) Always True

C)Not always True

Answer: B

Question 23

Suppose that we given a dataset for a binary classification problem with m data points. Half Of the data points belong to the positive class (i.e., y = 1) and the rest Of the data points belong to the negative class(i.e., y =-1) Somehow, we learned that the data points are not linearly separable.Then the statement: "Using perceptron algorithm to classify the data points with zero error is not possible" is?

A) true

B) false

Answer: A

Question 24

Suppose we remove the Naive assumption in Naive Bayes i.e, we assume no conditional independence - for binary classification to identify whether a text content in a social media platform is violent or non-violent. If we create a dictionary with 8 binary features (corresponding to presence or absence of the word in the tweet), then the number of parameters to be learned is?

A) 510

B) 511

C) 515

D) 517

Answer: B