Data Mining Quiz 3
Question 1
The income of a company that produces disaster equipment has been expressed as a linear regression model based upon the input variable which is the number of hurricanes projected for the upcoming hurricane season. The model is express as Y = mX + b where Y is the estimated sales in millions of dollars, m = .76 and b = 5. Assuming that the weather service is predicting 6 hurricanes during the season what are the sales in millions of dollars expected to be?
______ million dollars The correct answer is: 9.56
Question 2
True or False: The following data plot represents data that is linearly separable
Select one: * True * False
The correct answer is 'False'.
Question 3
Which of the following functions is used to generate a linear regression model within R?
Select one:
- a. lredict()
- b. lm()
- c. lstat()
- d. glm()
The correct answer is: lm()
Question 4
You have a dataset which produces the following plot and you need to create a predictive model. Which of the following techniques are you most likely to use?
Select one:
- a. Linear Regression
- b. Curvilinear Regression
- c. K-Nearest Neighbors
- d. Logistic Regression
The correct answer is: Linear Regression
Question 5
True or False: Residual plots are a useful tool for identifying non-linearity.
Select one:
- True
- False
The correct answer is 'True'.
Question 6
The names() function within R:
Select one: * a. Lists all of the column names in the data frame provided as an argument to the function. * b. Attaches the names to make the variables in the data frame available by name. * c. Displays the names of the classes identified by the K means clustering algorithm. * d. None of these answers
The correct answer is: Lists all of the column names in the data frame provided as an argument to the function.
Question 7
When data observations are placed into specific groups according to their observed characteristics this is known as: __
Select one:
- a. Classification
- b. Decision Tree Analysis
- c. Clustering
- d. Regression
The correct answer is: Classification
Question 8
True/False: A linear regression model can be used to predict categorical data values.
Select one:
- True
- False
The correct answer is 'False'.
Question 9
When using a relational database engine as the backend for analytics processing, the acronym ______ is used to describe it.
Select one:
- a. MOLAP
- b. ROLAP
- c. OLAP
- d. RDBMS
The correct answer is: ROLAP
Question 10
Which of the following statements will generate a multiple linear regression model within R where the output or predicted variables is Sales and the prediction variables include temperature and unemploymentrate?
Select one:
- a. lm(sales~temperature+unemploymentrate)
- b. lm(temperature+unemploymentrate=sales)
- c. lm(sales+temperature~unemploymentrate)
- d. None of these commands are valid
The correct answer is: lm(sales~temperature+unemploymentrate)
Question 11
The following diagram represents which technique?
Select one:
- a. Linear Regression
- b. Curvilinear Regression
- c. Spline Regression
- d. Polynomial curve fitting
The correct answer is: Curvilinear Regression
Question 12
A linear regression model is expressed as y ≈ β0+ β1x where β0 is the intercept and β1 is the slope of the line). The following equations can be used to compute the value of the coefficients β 0 and β1.
Using the following set of data, find the coefficients β 0 and β1 rounded to the nearest thousandths place and the predicted value of y when x is 10. {(-1 , 0),(0 , 2),(1 , 4),(2 , 5)} What is the value of y.
The correct answer is: 18.9
Question 13
A linear regression model is expressed as y ≈ β0+ β1x where β0 is the intercept and β1 is the slope of the line). The following equations can be used to compute the value of the coefficients β 0 and β1.
Using the following set of data, find the coefficients β 0 and β1 rounded to the nearest thousandths place and the predicted value of y when x is 10. {(-1 , 0),(0 , 2),(1 , 4),(2 , 5)} What is the value of β1.
The correct answer is: 1.7
Question 14
Assume that you had a variety of data including medical history, diet, heredity factors on individuals who developed cancer and you wanted to use this data to determine whether a person is likely to develop cancer. Which technique would be the most promising to start with?
Select one:
- a. Classification
- b. Regression
- c. Clustering
- d. Estimation
The correct answer is: Classification
Question 15
A linear regression model is expressed as y ≈ β0+ β1x where β0 is the intercept and β1 is the slope of the line). The following equations can be used to compute the value of the coefficients β 0 and β1.
Using the following set of data, find the coefficients β 0 and β1rounded to the nearest thousandths place and the predicted value of y when x is 10. {(-1 , 0),(0 , 2),(1 , 4),(2 , 5)} What is the value of β0.
The correct answer is: 1.9