25

Practice Set 25

Questions 241–250 (10 questions)

240

Each morning, a data scientist at a rental car company creates insights about the previous day’s rental car reservation demands. The company needs to automate this process by streaming the data to Amazon S3 in near real time. The solution must detect high-demand rental cars at each of the company’s locations. The solution also must create a visualization dashboard that automatically refreshes with the most recent data.Which solution will meet these requirements with the LEAST development time? [{"voted_answers": "A", "vote_count": 9, "is_most_voted": true}]

241

A company is planning a marketing campaign to promote a new product to existing customers. The company has data for past promotions that are similar. The company decides to try an experiment to send a more expensive marketing package to a smaller number of customers. The company wants to target the marketing campaign to customers who are most likely to buy the new product. The experiment requires that at least 90% of the customers who are likely to purchase the new product receive the marketing materials.The company trains a model by using the linear learner algorithm in Amazon SageMaker. The model has a recall score of 80% and a precision of 75%.How should the company retrain the model to meet these requirements? [{"voted_answers": "A", "vote_count": 9, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 3, "is_most_voted": false}]

242

A wildlife research company has a set of images of lions and cheetahs. The company created a dataset of the images. The company labeled each image with a binary label that indicates whether an image contains a lion or cheetah. The company wants to train a model to identify whether new images contain a lion or cheetah.Which Amazon SageMaker algorithm will meet this requirement? [{"voted_answers": "B", "vote_count": 6, "is_most_voted": true}]

243

A data scientist for a medical diagnostic testing company has developed a machine learning (ML) model to identify patients who have a specific disease. The dataset that the scientist used to train the model is imbalanced. The dataset contains a large number of healthy patients and only a small number of patients who have the disease. The model should consider that patients who are incorrectly identified as positive for the disease will increase costs for the company.Which metric will MOST accurately evaluate the performance of this model? [{"voted_answers": "D", "vote_count": 7, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 1, "is_most_voted": false}]

244

A machine learning (ML) specialist is training a linear regression model. The specialist notices that the model is overfitting. The specialist applies an L1 regularization parameter and runs the model again. This change results in all features having zero weights.What should the ML specialist do to improve the model results? [{"voted_answers": "B", "vote_count": 9, "is_most_voted": true}]

245

A machine learning (ML) engineer is integrating a production model with a customer metadata repository for real-time inference. The repository is hosted in Amazon SageMaker Feature Store. The engineer wants to retrieve only the latest version of the customer metadata record for a single customer at a time.Which solution will meet these requirements? [{"voted_answers": "D", "vote_count": 8, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 1, "is_most_voted": false}]

246

A company’s data scientist has trained a new machine learning model that performs better on test data than the company’s existing model performs in the production environment. The data scientist wants to replace the existing model that runs on an Amazon SageMaker endpoint in the production environment. However, the company is concerned that the new model might not work well on the production environment data.The data scientist needs to perform A/B testing in the production environment to evaluate whether the new model performs well on production environment data.Which combination of steps must the data scientist take to perform the A/B testing? (Choose two.) [{"voted_answers": "AE", "vote_count": 7, "is_most_voted": true}, {"voted_answers": "AC", "vote_count": 1, "is_most_voted": false}]

247

A data scientist is working on a forecast problem by using a dataset that consists of .csv files that are stored in Amazon S3. The files contain a timestamp variable in the following format:March 1st, 2020, 08:14pm -There is a hypothesis about seasonal differences in the dependent variable. This number could be higher or lower for weekdays because some days and hours present varying values, so the day of the week, month, or hour could be an important factor. As a result, the data scientist needs to transform the timestamp into weekdays, month, and day as three separate variables to conduct an analysis.Which solution requires the LEAST operational overhead to create a new dataset with the added features? [{"voted_answers": "C", "vote_count": 9, "is_most_voted": true}]

248

A manufacturing company has a production line with sensors that collect hundreds of quality metrics. The company has stored sensor data and manual inspection results in a data lake for several months. To automate quality control, the machine learning team must build an automated mechanism that determines whether the produced goods are good quality, replacement market quality, or scrap quality based on the manual inspection results.Which modeling approach will deliver the MOST accurate prediction of product quality? [{"voted_answers": "B", "vote_count": 10, "is_most_voted": true}]

249

A healthcare company wants to create a machine learning (ML) model to predict patient outcomes. A data science team developed an ML model by using a custom ML library. The company wants to use Amazon SageMaker to train this model. The data science team creates a custom SageMaker image to train the model. When the team tries to launch the custom image in SageMaker Studio, the data scientists encounter an error within the application.Which service can the data scientists use to access the logs for this error? [{"voted_answers": "D", "vote_count": 9, "is_most_voted": true}]