Quiz
๐ The first stage of the data science methodology is Data Understanding.
- True
- False
๐ The main purpose of the analytic approach is identifying what type of patterns will be needed to address the posed question most effectively.
- True
- False
๐ Which machine learning algorithm was implement in the case study discussed in the videos?
- Logistic Regression.
- k-Nearest Neighbor.
- Decision Tree Classification.
- Support Vector Machines.
๐ For the case study, a decision tree classification model was used to identify the combination of conditions leading to each patientโs outcome.
- True
- False
๐ The Data Requirements stage of the data science methodology involves identifying the necessary data content, formats and sources for initial data collection.
- True
- False
๐ Which of the following statements are correct?
- Data scientists determine how to collect the data.
- Data scientists identify the data that is required for data modeling.
- Data scientists determine how to prepare the data.
- None of the above.
๐ In the Data Collection stage, the business understanding of the problem is revised and decisions are made as to whether or not more data is needed.
- True
- False
๐ Database Administrators determine how to collect and prepare the data.
- True
- False
๐ In the case study, working through the Data Preparation stage, it was revealed that the initial definition was not capturing all of the congestive heart failure admissions that were expected, based on clinical experience.
- True
- False
๐ Select the correct statement about what data scientists do during the Data Preparation stage.
- During the Data Preparation stage, data scientists define the variables to be used in the model.
- During the Data Preparation stage, data scientists determine the timing of events.
- During the Data Preparation stage, data scientists aggregate the data and merge them from different sources.
- During the Data Preparation stage, data scientists identify missing data.
- All of the above statements are correct.
๐ Select the correct statement about the Data Preparation stage of the data science methodology.
- Data Preparation is typically the least time-consuming methodological step.
- Data Preparation involves dealing with missing improperly coded data and can include using text analysis to structure unstructured or semi-structured text data.
- Data Preparation cannot be accelerated through automation.
- None of the above statements are correct.
๐ Which statement best describes the Modeling Stage of the data science methodology?
- Modeling is always based on predictive models.
- Modeling always uses training and test sets.
- Modeling may require testing multiple algorithms and parameters.
- The Modeling stage is followed by the Analytic Approach stage.
๐ Model Evaluation includes ensuring that the data are properly handled and interpreted.
- True
- False
๐ Select the correct statements about the ROC curve.
- The ROC curve is a useful diagnostic tool for determining the optimal classification model.
- ROC stands for Receiver Operating Characteristic curve, which was originally developed to detect enemy aircrafts on radar.
- By plotting the true-positive rate against the false-positive rate for different values of the relative misclassification cost, the ROC curve can be used to select the optimal model.
- The ROC curve was originally developed to optimize healthcare and detect congestive heart failure readmission rate.
๐ Select the correct statement about the Feedback stage of the data science methodology.
- Feedback is not required once launched.
- Feedback is not helpful and gets in the way.
- Feedback is essential to the long term viability of the model.
- None of the above statements are correct.
๐ A data scientist determines that building a recommender system is the solution for a particular business problem at hand. What stage of the data science methodology does this represent?
- Model Evaluation.
- Analytic Approach.
- Deployment.
- Modeling.
๐ A car company asked a data scientist to determine what type of customers are more likely to purchase their vehicles. However, the data comes from several sources and is in a relatively โraw formatโ. What kind of processing can the data scientist perform on the data to prepare it for the Modeling stage?
A. Feature Engineering.
B. Transforming the data into more useful variables.
C. Combining the data from the various sources.
D. Addressing missing invalid values.
- Only options A and D are correct.
- Only option C is correct.
- None of the options are correct.
- All of the options are correct.
๐ Which of the following represent the two important characteristics of the data science methodology?
- It immediately ends when the model is deployed because no feedback is required.
- It is a highly iterative process and immediately ends when the model is deployed.
- It has no endpoint because data collection occurs before identifying the data requirements.
- It is a highly iterative process and it never ends.
๐ Data scientists may use either a โtop-downโ approach or a โbottom-upโ approach to data science. These two approaches refer to:
- โTop-downโ approach โ models are fit before the data is explored. โBottom-upโ approach โ data is explored, and then a model is fit.
- โTop-downโ approach โ using massively parallel, warehouses with huge data volumes as the data source. โBottom-upโ approach โ using a sample of small data before using large data.
- โTop-downโ approach โ first defining a business problem then analyzing the data to find a solution. โBottom-upโ approach โ starting with the data, and then coming up with a business problem based on the data.
- โTop-downโ approach โ the data, when sorted, is modeled from the โtopโ of the data towards the โbottomโ. โBottom-upโ approach โ the data is modeled from the โbottomโ of the data to the โtopโ.
๐ What are three important reasons that data scientists should maintain continuous communication with business sponsors throughout a project?
- So that business sponsors can ensure the work remains on track to generate the intended solution.
- So that business sponsors can provide domain expertise.
- Actually, data scientists do not need to maintain a continuous communication with business sponsors and stakeholders.
- So that business sponsors can review intermediate findings.
๐ Data scientists may frequently return to a previous stage to make adjustments, as they learn more about the data and the modeling.
- True.
- False.
๐ For predictive models, a test set, which is similar to โ but independent of โ the training set, is used to determine how well the model predicts outcomes. This is an example of what step in the methodology?
- Analytic Approach.
- Data Requirements.
- Deployment.
- Model Evaluation.
๐ The first state of the ________________ is Business Understanding.
- Data analysis methodology
- Data collection methodology
- Computer modeling methodology
- Data science methodology
๐ Business Understanding is an important stage in the data science methodology because;
- It generates the data that will be used in the study.
- It clearly defines the problem and the needs from a business perspective.
- It ensures that the work generates all possible solutions.
- It is determined by the analytical approach you want to use.
๐ According to the videos explaining the Data Requirements and Data Collection stages of the data science methodology, you can think of the Data Requirements and Data Collection stages as a cooking task, where the problem at hand is _______, and the data to answer the question is ________.
- The temperature; The shopping list
- The shopping list; The store
- The cooking style; The appliance
- The recipe; The ingredients
๐ In the Data Collection stage, techniques such as ___________ and visualization can be applied to the data set, to assess the content, quality, and initial insights about the data.
- The supervised method
- Descriptive statistics
- Data manipulation
- The unsupervised method
๐ Select the correct statement.
- A training set is used for predictive modeling.
- A training set is used for descriptive modeling.
- A training set is used for data visualization.
- A training set is used for statistical analysis.
๐ A type I error is a ____________.
- False-alarm error
- False-negative error
- Hypothesis error
- False-positive error
๐ The Data Understanding stage refers to the stage of removing redundant data.
- True
- False
๐ In what stage would you correct invalid values and address outliers?
- The Data Understanding stage
- The Data Preparation stage
- The Data Requirements stage
- The Modeling stage
๐ Which of the following is NOT one of the final stages of the data science methodology?
- Data Preparation
- Evaluation
- Deployment
- Feedback
๐ Deploying a model into production represents the beginning of an iterative process from Feedback, then Model Refinement, and to what?
- Redeployment
- Data storage
- Scalability
- None of the above
๐ Select the correct sentence about the data science methodology as explained in the course.
- The data science methodology does not depend on a specific set of technologies or tools.
- The data science methodology always starts with Business Understanding.
- The data science methodology is an iterative process.
- All of the above
๐ Support vector machines and neural networks are what type of algorithms?
- Clustering
- Classification
- Regression
- Extraction