Practice Set 20

Questions 191–200 (10 questions)

190

A machine learning (ML) specialist at a retail company is forecasting sales for one of the company's stores. The ML specialist is using data from the past 10 years. The company has provided a dataset that includes the total amount of money in sales each day for the store. Approximately 5% of the days are missing sales data.The ML specialist builds a simple forecasting model with the dataset and discovers that the model performs poorly. The performance is poor around the time of seasonal events, when the model consistently predicts sales figures that are too low or too high.Which actions should the ML specialist take to try to improve the model's performance? (Choose two.) [{"voted_answers": "AC", "vote_count": 11, "is_most_voted": true}, {"voted_answers": "AE", "vote_count": 10, "is_most_voted": false}, {"voted_answers": "CE", "vote_count": 2, "is_most_voted": false}]

191

A newspaper publisher has a table of customer data that consists of several numerical and categorical features, such as age and education history, as well as subscription status. The company wants to build a targeted marketing model for predicting the subscription status based on the table data.Which Amazon SageMaker built-in algorithm should be used to model the targeted marketing? [{"voted_answers": "B", "vote_count": 23, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]

192

A company will use Amazon SageMaker to train and host a machine learning model for a marketing campaign. The data must be encrypted at rest. Most of the data is sensitive customer data. The company wants AWS to maintain the root of trust for the encryption keys and wants key usage to be logged.Which solution will meet these requirements with the LEAST operational overhead? [{"voted_answers": "B", "vote_count": 12, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 2, "is_most_voted": false}]

193

A data scientist is working on a model to predict a company's required inventory stock levels. All historical data is stored in .csv files in the company's data lake on Amazon S3. The dataset consists of approximately 500 GB of data The data scientist wants to use SQL to explore the data before training the model. The company wants to minimize costs.Which option meets these requirements with the LEAST operational overhead? [{"voted_answers": "B", "vote_count": 18, "is_most_voted": true}]

194

A geospatial analysis company processes thousands of new satellite images each day to produce vessel detection data for commercial shipping. The company stores the training data in Amazon S3. The training data incrementally increases in size with new images each day.The company has configured an Amazon SageMaker training job to use a single ml.p2.xlarge instance with File input mode to train the built-in Object Detection algorithm. The training process was successful last month but is now failing because of a lack of storage. Aside from the addition of training data, nothing has changed in the model training process.A machine learning (ML) specialist needs to change the training configuration to fix the problem. The solution must optimize performance and must minimize the cost of training.Which solution will meet these requirements? [{"voted_answers": "B", "vote_count": 16, "is_most_voted": true}]

195

A company is using Amazon SageMaker to build a machine learning (ML) model to predict customer churn based on customer call transcripts. Audio files from customer calls are located in an on-premises VoIP system that has petabytes of recorded calls. The on-premises infrastructure has high-velocity networking and connects to the company's AWS infrastructure through a VPN connection over a 100 Mbps connection.The company has an algorithm for transcribing customer calls that requires GPUs for inference. The company wants to store these transcriptions in an Amazon S3 bucket in the AWS Cloud for model development.Which solution should an ML specialist use to deliver the transcriptions to the S3 bucket as quickly as possible? [{"voted_answers": "A", "vote_count": 13, "is_most_voted": true}, {"voted_answers": "C", "vote_count": 3, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}, {"voted_answers": "AD", "vote_count": 1, "is_most_voted": false}, {"voted_answers": "B", "vote_count": 1, "is_most_voted": false}]

196

A company has a podcast platform that has thousands of users. The company has implemented an anomaly detection algorithm to detect low podcast engagement based on a 10-minute running window of user events such as listening, pausing, and exiting the podcast. A machine learning (ML) specialist is designing the data ingestion of these events with the knowledge that the event payload needs some small transformations before inference.How should the ML specialist design the data ingestion to meet these requirements with the LEAST operational overhead? [{"voted_answers": "B", "vote_count": 21, "is_most_voted": true}, {"voted_answers": "C", "vote_count": 20, "is_most_voted": false}]

197

A company wants to predict the classification of documents that are created from an application. New documents are saved to an Amazon S3 bucket every 3 seconds. The company has developed three versions of a machine learning (ML) model within Amazon SageMaker to classify document text. The company wants to deploy these three versions to predict the classification of each document.Which approach will meet these requirements with the LEAST operational overhead? [{"voted_answers": "B", "vote_count": 20, "is_most_voted": true}, {"voted_answers": "C", "vote_count": 4, "is_most_voted": false}]

198

A manufacturing company needs to identify returned smartphones that have been damaged by moisture. The company has an automated process that produces 2,000 diagnostic values for each phone. The database contains more than five million phone evaluations. The evaluation process is consistent, and there are no missing values in the data. A machine learning (ML) specialist has trained an Amazon SageMaker linear learner ML model to classify phones as moisture damaged or not moisture damaged by using all available features. The model's F1 score is 0.6.Which changes in model training would MOST likely improve the model's F1 score? (Choose two.) [{"voted_answers": "AE", "vote_count": 25, "is_most_voted": true}, {"voted_answers": "AB", "vote_count": 1, "is_most_voted": false}]

199

A company is building a machine learning (ML) model to classify images of plants. An ML specialist has trained the model using the Amazon SageMaker built-in Image Classification algorithm. The model is hosted using a SageMaker endpoint on an ml.m5.xlarge instance for real-time inference. When used by researchers in the field, the inference has greater latency than is acceptable. The latency gets worse when multiple researchers perform inference at the same time on their devices. Using Amazon CloudWatch metrics, the ML specialist notices that the ModelLatency metric shows a high value and is responsible for most of the response latency.The ML specialist needs to fix the performance issue so that researchers can experience less latency when performing inference from their devices.Which action should the ML specialist take to meet this requirement? [{"voted_answers": "B", "vote_count": 20, "is_most_voted": true}]

Practice Set 19 Practice Set 21