Data Mining Quiz 4. True/False: Supervised Learning Features Both Input Variables Or Attributes And An Output Or Predicted Variable.
Question 1 True/False: Supervised learning features both input variables or attributes and an output or predicted variable. Select one:
- True
- False
The correct answer is 'True'.
Question 2
The sales of a company (in million dollars) for each year are shown in the table below, identify the linear regression model in the form y=mx+b and report the values of m (slope) and b (intercept) as well as the estimated value of y when the value of x is 10.
NOTE: You should consider the value x as the elapsed time. For 2005 this would be 0 years, for 2006 it would be 1 year and for 2012 it would be 7 years. What is the value of b?
The correct answer is: 11.6
Question 3
True/False: Shared nothing architectures distribute the processing of queries to access large volumes of data and provide near linear scalability in both storage volume and query performance.
Select one:
- True
- False
The correct answer is 'True'.
Question 4
The sales of a company (in million dollars) for each year are shown in the table below, identify the linear regression model in the form y=mx+b and report the values of m (slope) and b (intercept) as well as the estimated value of y when the value of x is 10.
NOTE: You should consider the value x as the elapsed time. For 2005 this would be 0 years, for 2006 it would be 1 year and for 2012 it would be 7 years.11.6 What is the value of m?
The correct answer is: 8.4
Question 5
The income of a company that produces disaster equipment has been expressed as a linear regression model based upon the input variable which is the number of hurricanes projected for the upcoming hurricane season. The model is express as Y = mX + b where Y is the estimated sales in millions of dollars, m = .67 and b = 8.2. Assuming that the weather service is predicting 12 hurricanes during the season what are the sales in millions of dollars expected to be?
The correct answer is: 16.24
Question 6
Assuming you have a linear model in which the value of m is .05 and the value of b is 10 that explains the relationship between income and credit extended. If income is 50,000, what credit will be extended?
Select one:
- a. 500
- b. 5010
- c. 20508.4
- d. 2510
The correct answer is: 2510
Question 7
The sales of a company (in million dollars) for each year are shown in the table below, identify the linear regression model in the form y=mx+b.
NOTE: You should consider the value x as the elapsed time. For 2005 this would be 0 years, for 2006 it would be 1 year and for 2012 it would be 7 years. What is the predicted value of y (in millions of dollars) when the year is 2012?
The correct answer is: 70.4
Question 8
True/False: The snowflake schema differs from the star schema in that the table holding the dimensional data are normalized.
Select one:
- True
- False
The correct answer is 'True'.
Question 9
True/False: Data Mining can be said to be a process designed to detect patterns in data sets.
Select one:
- True
- False
The correct answer is 'True'.
Question 10
True/False: According to our textbook, residual plots are a useful tool for identifying clusters.
Select one:
- True
- False
The correct answer is 'False'.
Question 11
True/False: A regression model has a R 2 statistic of .15. This indicates that the regression model is NOT a good fit and does a poor job of predicting the outcome based upon the input variables.
Select one:
- True
- False
The correct answer is 'True'.
Question 12
Assume that you have a data set which produces the following data plot. You wish to predict if a new case would be a ‘red’ case as opposed to a ‘blue’ case based upon the input attribute data. Which technique should you use?
Select one:
- a. Linear Regression
- b. Curvilinear Regression
- c. Spline Regression
- d. Logistic Regression
The correct answer is: Logistic Regression
Question 13 True/False: Reinforcement learning features elements of both supervised learning and unsupervised learning as the outcome variable or predicted values are validated over time and feedback is used to continuously train the learning algorithm. Select one:
- True
- False
The correct answer is 'True'.
Question 14
True or False: Qualitative variables are often referred to as categorical.
Select one:
- True
- False
The correct answer is 'True'.
Question 15
Which of the following is NOT a classification technique?
Select one:
- a. Logistic regression
- b. Linear discriminant analysis
- c. K-nearest neighbors
- d. Principle components analysis
The correct answer is: Principle components analysis
Question 16
True or False: Bayes theorem classifies cases by calculating the probability that the case belongs to each class and then selecting the one with the highest probability.
Select one:
- True
- False
The correct answer is 'True'.
Question 17
The value of K should typically be an odd number for what reason?
Select one:
- a. It ensures that when classifying a solution there will not be a tie
- b. It makes iterative process of the algorithm more efficient
- c. It enables the algorithm to be implemented using recursion
- d. None of these answers
The correct answer is: It ensures that when classifying a solution there will not be a tie
Question 18
Assuming K=1 how would the point X be classified using KNN?
Select one:
- a. Red
- b. Blue
The correct answer is: Blue
Question 19
Assuming K=3 how would the point X be classified using KNN?
Select one:
- a. Red
- b. Blue
The correct answer is: Red
Question 20
Assuming K=3, how would the point X be classified using KNN?
Select one:
- a. Red
- b. Blue
The correct answer is: Red
Question 21
Assuming K=5, how would the point X be classified using KNN?
Select one:
- a. Red
- b. Blue
The correct answer is: Red
Question 22
Assuming you have the following data values (4,6,9,20,8,7), what is the min-max normalized value for 6.
Where X is the set of data values and X v is the value to score. Provide your response rounded to the thousandths place: ___
The correct answer is: 0.125
Question 23
Assuming you have the following data values (3,6,9,14,2), what is the Z-Score normalized value for 5.
Where X is the set of data values and X v is the value to score. Provide your response rounded to the thousandths place: ___
The correct answer is: -0.37
Question 24
Assume that you are the data scientist for the GreatFoods! Supermarket chain. In an effort to increase sales of locally produced food such as eggs, milk, and bread, your manager asks you to develop a data mining solution that can identify the probability that a customer will purchase eggs when they purchase milk and vice versa. Which technique are you most likely to use?
Select one:
- a. Linear Regression
- b. K-nearest neighbor’s classification
- c. Bayes Classifier
- d. Hierarchical clustering
The correct answer is: Bayes Classifier