Practice Set 31
Questions 301–310 (10 questions)
A machine learning (ML) specialist is using the Amazon SageMaker DeepAR forecasting algorithm to train a model on CPU-based Amazon EC2 On-Demand instances. The model currently takes multiple hours to train. The ML specialist wants to decrease the training time of the model.Which approaches will meet this requirement? (Choose two.) [{"voted_answers": "CD", "vote_count": 9, "is_most_voted": true}]
A chemical company has developed several machine learning (ML) solutions to identify chemical process abnormalities. The time series values of independent variables and the labels are available for the past 2 years and are sufficient to accurately model the problem.The regular operation label is marked as 0 The abnormal operation label is marked as 1. Process abnormalities have a significant negative effect on the company’s profits. The company must avoid these abnormalities.Which metrics will indicate an ML solution that will provide the GREATEST probability of detecting an abnormality? [{"voted_answers": "B", "vote_count": 3, "is_most_voted": true}]
An online delivery company wants to choose the fastest courier for each delivery at the moment an order is placed. The company wants to implement this feature for existing users and new users of its application. Data scientists have trained separate models with XGBoost for this purpose, and the models are stored in Amazon S3. There is one model for each city where the company operates.Operation engineers are hosting these models in Amazon EC2 for responding to the web client requests, with one instance for each model, but the instances have only a 5% utilization in CPU and memory. The operation engineers want to avoid managing unnecessary resources.Which solution will enable the company to achieve its goal with the LEAST operational overhead? [{"voted_answers": "B", "vote_count": 5, "is_most_voted": true}]
A company builds computer-vision models that use deep learning for the autonomous vehicle industry. A machine learning (ML) specialist uses an Amazon EC2 instance that has a CPU:GPU ratio of 12:1 to train the models.The ML specialist examines the instance metric logs and notices that the GPU is idle half of the time. The ML specialist must reduce training costs without increasing the duration of the training jobs.Which solution will meet these requirements? [{"voted_answers": "B", "vote_count": 3, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 2, "is_most_voted": true}]
A company wants to forecast the daily price of newly launched products based on 3 years of data for older product prices, sales, and rebates. The time-series data has irregular timestamps and is missing some values.Data scientist must build a dataset to replace the missing values. The data scientist needs a solution that resamples the data daily and exports the data for further modeling.Which solution will meet these requirements with the LEAST implementation effort? [{"voted_answers": "C", "vote_count": 6, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 4, "is_most_voted": false}]
A data scientist is building a forecasting model for a retail company by using the most recent 5 years of sales records that are stored in a data warehouse. The dataset contains sales records for each of the company’s stores across five commercial regions. The data scientist creates a working dataset with StoreID. Region. Date, and Sales Amount as columns. The data scientist wants to analyze yearly average sales for each region. The scientist also wants to compare how each region performed compared to average sales across all commercial regions.Which visualization will help the data scientist better understand the data trend? [{"voted_answers": "D", "vote_count": 5, "is_most_voted": true}]
A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages.Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions. The ML specialist must ensure that the data does not contain outliers before training the model.How can the ML specialist meet these requirements with the LEAST operational overhead? [{"voted_answers": "C", "vote_count": 7, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]
A data scientist obtains a tabular dataset that contains 150 correlated features with different ranges to build a regression model. The data scientist needs to achieve more efficient model training by implementing a solution that minimizes impact on the model’s performance. The data scientist decides to perform a principal component analysis (PCA) preprocessing step to reduce the number of features to a smaller set of independent features before the data scientist uses the new features in the regression model.Which preprocessing step will meet these requirements? [{"voted_answers": "B", "vote_count": 8, "is_most_voted": true}, {"voted_answers": "C", "vote_count": 3, "is_most_voted": false}]
An online retailer collects the following data on customer orders: demographics, behaviors, location, shipment progress, and delivery time. A data scientist joins all the collected datasets. The result is a single dataset that includes 980 variables.The data scientist must develop a machine learning (ML) model to identify groups of customers who are likely to respond to a marketing campaign.Which combination of algorithms should the data scientist use to meet this requirement? (Choose two.) [{"voted_answers": "BD", "vote_count": 5, "is_most_voted": true}]
A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.What should the engineer do to improve the validation accuracy of the model? [{"voted_answers": "A", "vote_count": 6, "is_most_voted": true}]