15

Practice Set 15

Questions 141–150 (10 questions)

140

A retail company wants to combine its customer orders with the product description data from its product catalog. The structure and format of the records in each dataset is different. A data analyst tried to use a spreadsheet to combine the datasets, but the effort resulted in duplicate records and records that were not properly combined. The company needs a solution that it can use to combine similar records from the two datasets and remove any duplicates.Which solution will meet these requirements? [{"voted_answers": "C", "vote_count": 27, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]

141

A company provisions Amazon SageMaker notebook instances for its data science team and creates Amazon VPC interface endpoints to ensure communication between the VPC and the notebook instances. All connections to the Amazon SageMaker API are contained entirely and securely using the AWS network.However, the data science team realizes that individuals outside the VPC can still connect to the notebook instances across the internet.Which set of actions should the data science team take to fix the issue? [{"voted_answers": "B", "vote_count": 24, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 16, "is_most_voted": false}]

142

A company will use Amazon SageMaker to train and host a machine learning (ML) model for a marketing campaign. The majority of data is sensitive customer data. The data must be encrypted at rest. The company wants AWS to maintain the root of trust for the master keys and wants encryption key usage to be logged.Which implementation will meet these requirements? [{"voted_answers": "C", "vote_count": 6, "is_most_voted": true}]

143

A machine learning specialist stores IoT soil sensor data in Amazon DynamoDB table and stores weather event data as JSON files in Amazon S3. The dataset inDynamoDB is 10 GB in size and the dataset in Amazon S3 is 5 GB in size. The specialist wants to train a model on this data to help predict soil moisture levels as a function of weather events using Amazon SageMaker.Which solution will accomplish the necessary transformation to train the Amazon SageMaker model with the LEAST amount of administrative overhead? [{"voted_answers": "D", "vote_count": 22, "is_most_voted": true}]

144

A company sells thousands of products on a public website and wants to automatically identify products with potential durability problems. The company has1.000 reviews with date, star rating, review text, review summary, and customer email fields, but many reviews are incomplete and have empty fields. Each review has already been labeled with the correct durability result.A machine learning specialist must train a model to identify reviews expressing concerns over product durability. The first model needs to be trained and ready to review in 2 days.What is the MOST direct approach to solve this problem within 2 days? [{"voted_answers": "A", "vote_count": 29, "is_most_voted": true}, {"voted_answers": "C", "vote_count": 7, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 3, "is_most_voted": false}]

145

A company that runs an online library is implementing a chatbot using Amazon Lex to provide book recommendations based on category. This intent is fulfilled by an AWS Lambda function that queries an Amazon DynamoDB table for a list of book titles, given a particular category. For testing, there are only three categories implemented as the custom slot types: "comedy," "adventure,` and "documentary.`A machine learning (ML) specialist notices that sometimes the request cannot be fulfilled because Amazon Lex cannot understand the category spoken by users with utterances such as "funny," "fun," and "humor." The ML specialist needs to fix the problem without changing the Lambda code or data in DynamoDB.How should the ML specialist fix the problem? [{"voted_answers": "D", "vote_count": 17, "is_most_voted": true}]

146

A manufacturing company uses machine learning (ML) models to detect quality issues. The models use images that are taken of the company's product at the end of each production step. The company has thousands of machines at the production site that generate one image per second on average.The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists used an industrial PC that ran AWS IoT Greengrass with a long-running AWS Lambda function that uploaded the images to Amazon S3. The uploaded images invoked a Lambda function that was written in Python to perform inference by using an Amazon SageMaker endpoint that ran a custom model. The inference results were forwarded back to a web service that was hosted at the production site to prevent faulty products from being shipped.The company scaled the solution out to all manufacturing machines by installing similarly configured industrial PCs on each production machine. However, latency for predictions increased beyond acceptable limits. Analysis shows that the internet connection is at its capacity limit.How can the company resolve this issue MOST cost-effectively? [{"voted_answers": "D", "vote_count": 23, "is_most_voted": true}]

147

A data scientist is using an Amazon SageMaker notebook instance and needs to securely access data stored in a specific Amazon S3 bucket.How should the data scientist accomplish this? [{"voted_answers": "C", "vote_count": 13, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 4, "is_most_voted": false}, {"voted_answers": "B", "vote_count": 1, "is_most_voted": false}]

148

A company is launching a new product and needs to build a mechanism to monitor comments about the company and its new product on social media. The company needs to be able to evaluate the sentiment expressed in social media posts, and visualize trends and configure alarms based on various thresholds.The company needs to implement this solution quickly, and wants to minimize the infrastructure and data science resources needed to evaluate the messages.The company already has a solution in place to collect posts and store them within an Amazon S3 bucket.What services should the data science team use to deliver this solution? [{"voted_answers": "D", "vote_count": 19, "is_most_voted": true}, {"voted_answers": "C", "vote_count": 2, "is_most_voted": false}]

149

A bank wants to launch a low-rate credit promotion. The bank is located in a town that recently experienced economic hardship. Only some of the bank's customers were affected by the crisis, so the bank's credit team must identify which customers to target with the promotion. However, the credit team wants to make sure that loyal customers' full credit history is considered when the decision is made.The bank's data science team developed a model that classifies account transactions and understands credit eligibility. The data science team used the XGBoost algorithm to train the model. The team used 7 years of bank transaction historical data for training and hyperparameter tuning over the course of several days.The accuracy of the model is sufficient, but the credit team is struggling to explain accurately why the model denies credit to some customers. The credit team has almost no skill in data science.What should the data science team do to address this issue in the MOST operationally efficient manner? [{"voted_answers": "B", "vote_count": 21, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 8, "is_most_voted": false}, {"voted_answers": "C", "vote_count": 7, "is_most_voted": false}]