Practice Set 18
Questions 171–180 (10 questions)
A machine learning (ML) specialist wants to create a data preparation job that uses a PySpark script with complex window aggregation operations to create data for training and testing. The ML specialist needs to evaluate the impact of the number of features and the sample count on model performance.Which approach should the ML specialist use to determine the ideal data transformations for the model? [{"voted_answers": "D", "vote_count": 20, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 9, "is_most_voted": false}]
A data scientist has a dataset of machine part images stored in Amazon Elastic File System (Amazon EFS). The data scientist needs to use Amazon SageMaker to create and train an image classification machine learning model based on this dataset. Because of budget and time constraints, management wants the data scientist to create and train a model with the least number of steps and integration work required.How should the data scientist meet these requirements? [{"voted_answers": "D", "vote_count": 29, "is_most_voted": true}]
A retail company uses a machine learning (ML) model for daily sales forecasting. The company's brand manager reports that the model has provided inaccurate results for the past 3 weeks.At the end of each day, an AWS Glue job consolidates the input data that is used for the forecasting with the actual daily sales data and the predictions of the model. The AWS Glue job stores the data in Amazon S3. The company's ML team is using an Amazon SageMaker Studio notebook to gain an understanding about the source of the model's inaccuracies.What should the ML team do on the SageMaker Studio notebook to visualize the model's degradation MOST accurately? [{"voted_answers": "C", "vote_count": 11, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 8, "is_most_voted": false}, {"voted_answers": "B", "vote_count": 6, "is_most_voted": false}]
An ecommerce company sends a weekly email newsletter to all of its customers. Management has hired a team of writers to create additional targeted content. A data scientist needs to identify five customer segments based on age, income, and location. The customers' current segmentation is unknown. The data scientist previously built an XGBoost model to predict the likelihood of a customer responding to an email based on age, income, and location.Why does the XGBoost model NOT meet the current requirements, and how can this be fixed? [{"voted_answers": "D", "vote_count": 20, "is_most_voted": true}]
A global financial company is using machine learning to automate its loan approval process. The company has a dataset of customer information. The dataset contains some categorical fields, such as customer location by city and housing status. The dataset also includes financial fields in different units, such as account balances in US dollars and monthly interest in US cents.The company's data scientists are using a gradient boosting regression model to infer the credit score for each customer. The model has a training accuracy of99% and a testing accuracy of 75%. The data scientists want to improve the model's testing accuracy.Which process will improve the testing accuracy the MOST? [{"voted_answers": "A", "vote_count": 21, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 1, "is_most_voted": false}]
A machine learning (ML) specialist needs to extract embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive models. The text consists of curated sentences in English. Many sentences use similar words but in different contexts. There are questions and answers among the sentences, and the embedding space must differentiate between them.Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.) [{"voted_answers": "AC", "vote_count": 16, "is_most_voted": true}, {"voted_answers": "BC", "vote_count": 15, "is_most_voted": false}, {"voted_answers": "CE", "vote_count": 12, "is_most_voted": false}, {"voted_answers": "BD", "vote_count": 9, "is_most_voted": false}, {"voted_answers": "AE", "vote_count": 7, "is_most_voted": false}, {"voted_answers": "BE", "vote_count": 6, "is_most_voted": false}, {"voted_answers": "CD", "vote_count": 3, "is_most_voted": false}]
A retail company wants to update its customer support system. The company wants to implement automatic routing of customer claims to different queues to prioritize the claims by category.Currently, an operator manually performs the category assignment and routing. After the operator classifies and routes the claim, the company stores the claim's record in a central database. The claim's record includes the claim's category.The company has no data science team or experience in the field of machine learning (ML). The company's small development team needs a solution that requires no ML expertise.Which solution meets these requirements? [{"voted_answers": "D", "vote_count": 27, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 1, "is_most_voted": false}]
A machine learning (ML) specialist is using Amazon SageMaker hyperparameter optimization (HPO) to improve a model's accuracy. The learning rate parameter is specified in the following HPO configuration:During the results analysis, the ML specialist determines that most of the training jobs had a learning rate between 0.01 and 0.1. The best result had a learning rate of less than 0.01. Training jobs need to run regularly over a changing dataset. The ML specialist needs to find a tuning mechanism that uses different learning rates more evenly from the provided range between MinValue and MaxValue.Which solution provides the MOST accurate result? [{"voted_answers": "C", "vote_count": 20, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 4, "is_most_voted": false}]
A manufacturing company wants to use machine learning (ML) to automate quality control in its facilities. The facilities are in remote locations and have limited internet connectivity. The company has 20 ׀¢׀’ of training data that consists of labeled images of defective product parts. The training data is in the corporate on- premises data center.The company will use this data to train a model for real-time defect detection in new parts as the parts move on a conveyor belt in the facilities. The company needs a solution that minimizes costs for compute infrastructure and that maximizes the scalability of resources for training. The solution also must facilitate the company's use of an ML model in the low-connectivity environments.Which solution will meet these requirements? [{"voted_answers": "C", "vote_count": 24, "is_most_voted": true}]
A company has an ecommerce website with a product recommendation engine built in TensorFlow. The recommendation engine endpoint is hosted by AmazonSageMaker. Three compute-optimized instances support the expected peak load of the website.Response times on the product recommendation page are increasing at the beginning of each month. Some users are encountering errors. The website receives the majority of its traffic between 8 AM and 6 PM on weekdays in a single time zone.Which of the following options are the MOST effective in solving the issue while keeping costs to a minimum? (Choose two.) [{"voted_answers": "AC", "vote_count": 31, "is_most_voted": true}, {"voted_answers": "CE", "vote_count": 6, "is_most_voted": false}, {"voted_answers": "AE", "vote_count": 5, "is_most_voted": false}]