27

Practice Set 27

Questions 261–270 (10 questions)

260

A social media company wants to develop a machine learning (ML) model to detect inappropriate or offensive content in images. The company has collected a large dataset of labeled images and plans to use the built-in Amazon SageMaker image classification algorithm to train the model. The company also intends to use SageMaker pipe mode to speed up the training.The company splits the dataset into training, validation, and testing datasets. The company stores the training and validation images in folders that are named Training and Validation, respectively. The folders contain subfolders that correspond to the names of the dataset classes. The company resizes the images to the same size and generates two input manifest files named training.lst and validation.lst, for the training dataset and the validation dataset, respectively. Finally, the company creates two separate Amazon S3 buckets for uploads of the training dataset and the validation dataset.Which additional data preparation steps should the company take before uploading the files to Amazon S3? [{"voted_answers": "D", "vote_count": 9, "is_most_voted": true}]

261

A media company wants to create a solution that identifies celebrities in pictures that users upload. The company also wants to identify the IP address and the timestamp details from the users so the company can prevent users from uploading pictures from unauthorized locations.Which solution will meet these requirements with LEAST development effort? [{"voted_answers": "C", "vote_count": 8, "is_most_voted": true}]

262

A pharmaceutical company performs periodic audits of clinical trial sites to quickly resolve critical findings. The company stores audit documents in text format. Auditors have requested help from a data science team to quickly analyze the documents. The auditors need to discover the 10 main topics within the documents to prioritize and distribute the review work among the auditing team members. Documents that describe adverse events must receive the highest priority.A data scientist will use statistical modeling to discover abstract topics and to provide a list of the top words for each category to help the auditors assess the relevance of the topic.Which algorithms are best suited to this scenario? (Choose two.) [{"voted_answers": "AC", "vote_count": 10, "is_most_voted": true}]

263

A company needs to deploy a chatbot to answer common questions from customers. The chatbot must base its answers on company documentation.Which solution will meet these requirements with the LEAST development effort? [{"voted_answers": "A", "vote_count": 10, "is_most_voted": true}]

264

A company wants to conduct targeted marketing to sell solar panels to homeowners. The company wants to use machine learning (ML) technologies to identify which houses already have solar panels. The company has collected 8,000 satellite images as training data and will use Amazon SageMaker Ground Truth to label the data.The company has a small internal team that is working on the project. The internal team has no ML expertise and no ML experience.Which solution will meet these requirements with the LEAST amount of effort from the internal team? [{"voted_answers": "A", "vote_count": 9, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 3, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 2, "is_most_voted": false}]

265

A company hosts a machine learning (ML) dataset repository on Amazon S3. A data scientist is preparing the repository to train a model. The data scientist needs to redact personally identifiable information (PH) from the dataset.Which solution will meet these requirements with the LEAST development effort? [{"voted_answers": "C", "vote_count": 7, "is_most_voted": true}]

266

A company is deploying a new machine learning (ML) model in a production environment. The company is concerned that the ML model will drift over time, so the company creates a script to aggregate all inputs and predictions into a single file at the end of each day. The company stores the file as an object in an Amazon S3 bucket. The total size of the daily file is 100 GB. The daily file size will increase over time.Four times a year, the company samples the data from the previous 90 days to check the ML model for drift. After the 90-day period, the company must keep the files for compliance reasons.The company needs to use S3 storage classes to minimize costs. The company wants to maintain the same storage durability of the data.Which solution will meet these requirements? [{"voted_answers": "C", "vote_count": 15, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 13, "is_most_voted": false}, {"voted_answers": "A", "vote_count": 3, "is_most_voted": false}]

267

A company wants to enhance audits for its machine learning (ML) systems. The auditing system must be able to perform metadata analysis on the features that the ML models use. The audit solution must generate a report that analyzes the metadata. The solution also must be able to set the data sensitivity and authorship of features.Which solution will meet these requirements with the LEAST development effort? [{"voted_answers": "D", "vote_count": 8, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 7, "is_most_voted": false}]

268

A machine learning (ML) specialist uploads a dataset to an Amazon S3 bucket that is protected by server-side encryption with AWS KMS keys (SSE-KMS). The ML specialist needs to ensure that an Amazon SageMaker notebook instance can read the dataset that is in Amazon S3.Which solution will meet these requirements? [{"voted_answers": "C", "vote_count": 6, "is_most_voted": true}]

269

A company has a podcast platform that has thousands of users. The company implemented an algorithm to detect low podcast engagement based on a 10-minute running window of user events such as listening to, pausing, and closing the podcast. A machine learning (ML) specialist is designing the ingestion process for these events. The ML specialist needs to transform the data to prepare the data for inference.How should the ML specialist design the transformation step to meet these requirements with the LEAST operational effort? [{"voted_answers": "C", "vote_count": 5, "is_most_voted": true}]