22

Practice Set 22

Questions 211–220 (10 questions)

210

A bank wants to use a machine learning (ML) model to predict if users will default on credit card payments. The training data consists of 30,000 labeled records and is evenly balanced between two categories. For the model, an ML specialist selects the Amazon SageMaker built-in XGBoost algorithm and configures a SageMaker automatic hyperparameter optimization job with the Bayesian method. The ML specialist uses the validation accuracy as the objective metric.When the bank implements the solution with this model, the prediction accuracy is 75%. The bank has given the ML specialist 1 day to improve the model in production.Which approach is the FASTEST way to improve the model's accuracy? [{"voted_answers": "C", "vote_count": 11, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 4, "is_most_voted": false}]

211

A data scientist has 20 TB of data in CSV format in an Amazon S3 bucket. The data scientist needs to convert the data to Apache Parquet format.How can the data scientist convert the file format with the LEAST amount of effort? [{"voted_answers": "B", "vote_count": 7, "is_most_voted": true}]

212

A company is building a pipeline that periodically retrains its machine learning (ML) models by using new streaming data from devices. The company's data engineering team wants to build a data ingestion system that has high throughput, durable storage, and scalability. The company can tolerate up to 5 minutes of latency for data ingestion. The company needs a solution that can apply basic data transformation during the ingestion process.Which solution will meet these requirements with the MOST operational efficiency? [{"voted_answers": "A", "vote_count": 9, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]

213

A retail company is ingesting purchasing records from its network of 20,000 stores to Amazon S3 by using Amazon Kinesis Data Firehose. The company uses a small, server-based application in each store to send the data to AWS over the internet. The company uses this data to train a machine learning model that is retrained each day. The company's data science team has identified existing attributes on these records that could be combined to create an improved model.Which change will create the required transformed records with the LEAST operational overhead? [{"voted_answers": "A", "vote_count": 11, "is_most_voted": true}]

214

A sports broadcasting company is planning to introduce subtitles in multiple languages for a live broadcast. The commentary is in English. The company needs the transcriptions to appear on screen in French or Spanish, depending on the broadcasting country. The transcriptions must be able to capture domain-specific terminology, names, and locations based on the commentary context. The company needs a solution that can support options to provide tuning data.Which combination of AWS services and features will meet these requirements with the LEAST operational overhead? (Choose two.) [{"voted_answers": "AE", "vote_count": 17, "is_most_voted": true}, {"voted_answers": "BE", "vote_count": 17, "is_most_voted": false}]

215

A data scientist at a retail company is forecasting sales for a product over the next 3 months. After preliminary analysis, the data scientist identifies that sales are seasonal and that holidays affect sales. The data scientist also determines that sales of the product are correlated with sales of other products in the same category.The data scientist needs to train a sales forecasting model that incorporates this information.Which solution will meet this requirement with the LEAST development effort? [{"voted_answers": "B", "vote_count": 15, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 6, "is_most_voted": false}]

216

A company is building a predictive maintenance model for its warehouse equipment. The model must predict the probability of failure of all machines in the warehouse. The company has collected 10,000 event samples within 3 months. The event samples include 100 failure cases that are evenly distributed across 50 different machine types.How should the company prepare the data for the model to improve the model's accuracy? [{"voted_answers": "B", "vote_count": 15, "is_most_voted": true}]

217

A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products.Which solution will meet these requirements with the MOST operational efficiency? [{"voted_answers": "C", "vote_count": 18, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 2, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 2, "is_most_voted": false}]

218

A sports analytics company is providing services at a marathon. Each runner in the marathon will have their race ID printed as text on the front of their shirt. The company needs to extract race IDs from images of the runners.Which solution will meet these requirements with the LEAST operational overhead? [{"voted_answers": "A", "vote_count": 7, "is_most_voted": true}]

219

A manufacturing company wants to monitor its devices for anomalous behavior. A data scientist has trained an Amazon SageMaker scikit-learn model that classifies a device as normal or anomalous based on its 4-day telemetry. The 4-day telemetry of each device is collected in a separate file and is placed in an Amazon S3 bucket once every hour. The total time to run the model across the telemetry for all devices is 5 minutes.What is the MOST cost-effective solution for the company to use to run the model across the telemetry for all the devices? [{"voted_answers": "A", "vote_count": 17, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 1, "is_most_voted": false}]