Practice Set 24
Questions 231–240 (10 questions)
An ecommerce company is collecting structured data and unstructured data from its website, mobile apps, and IoT devices. The data is stored in several databases and Amazon S3 buckets. The company is implementing a scalable repository to store structured data and unstructured data. The company must implement a solution that provides a central data catalog, self-service access to the data, and granular data access policies and encryption to protect the data.Which combination of actions will meet these requirements with the LEAST amount of setup? (Choose three.) [{"voted_answers": "ACE", "vote_count": 12, "is_most_voted": true}, {"voted_answers": "ACD", "vote_count": 2, "is_most_voted": false}]
A machine learning (ML) specialist is developing a deep learning sentiment analysis model that is based on data from movie reviews. After the ML specialist trains the model and reviews the model results on the validation set, the ML specialist discovers that the model is overfitting.Which solutions will MOST improve the model generalization and reduce overfitting? (Choose three.) [{"voted_answers": "DEF", "vote_count": 18, "is_most_voted": true}, {"voted_answers": "BDE", "vote_count": 3, "is_most_voted": false}]
An online advertising company is developing a linear model to predict the bid price of advertisements in real time with low-latency predictions. A data scientist has trained the linear model by using many features, but the model is overfitting the training dataset. The data scientist needs to prevent overfitting and must reduce the number of features.Which solution will meet these requirements? [{"voted_answers": "A", "vote_count": 11, "is_most_voted": true}]
A credit card company wants to identify fraudulent transactions in real time. A data scientist builds a machine learning model for this purpose. The transactional data is captured and stored in Amazon S3. The historic data is already labeled with two classes: fraud (positive) and fair transactions (negative). The data scientist removes all the missing data and builds a classifier by using the XGBoost algorithm in Amazon SageMaker. The model produces the following results:• True positive rate (TPR): 0.700• False negative rate (FNR): 0.300• True negative rate (TNR): 0.977• False positive rate (FPR): 0.023• Overall accuracy: 0.949Which solution should the data scientist use to improve the performance of the model? [{"voted_answers": "A", "vote_count": 12, "is_most_voted": true}]
A company is training machine learning (ML) models on Amazon SageMaker by using 200 TB of data that is stored in Amazon S3 buckets. The training data consists of individual files that are each larger than 200 MB in size. The company needs a data access solution that offers the shortest processing time and the least amount of setup.Which solution will meet these requirements? [{"voted_answers": "D", "vote_count": 26, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 11, "is_most_voted": false}]
An online store is predicting future book sales by using a linear regression model that is based on past sales data. The data includes duration, a numerical feature that represents the number of days that a book has been listed in the online store. A data scientist performs an exploratory data analysis and discovers that the relationship between book sales and duration is skewed and non-linear.Which data transformation step should the data scientist take to improve the predictions of the model? [{"voted_answers": "C", "vote_count": 20, "is_most_voted": true}]
A company's data engineer wants to use Amazon S3 to share datasets with data scientists. The data scientists work in three departments: Finance. Marketing, and Human Resources. Each department has its own IAM user group. Some datasets contain sensitive information and should be accessed only by the data scientists from the Finance department.How can the data engineer set up access to meet these requirements? [{"voted_answers": "C", "vote_count": 15, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 6, "is_most_voted": false}, {"voted_answers": "B", "vote_count": 1, "is_most_voted": false}]
A company operates an amusement park. The company wants to collect, monitor, and store real-time traffic data at several park entrances by using strategically placed cameras. The company’s security team must be able to immediately access the data for viewing. Stored data must be indexed and must be accessible to the company’s data science team.Which solution will meet these requirements MOST cost-effectively? [{"voted_answers": "B", "vote_count": 6, "is_most_voted": true}]
An engraving company wants to automate its quality control process for plaques. The company performs the process before mailing each customized plaque to a customer. The company has created an Amazon S3 bucket that contains images of defects that should cause a plaque to be rejected. Low-confidence predictions must be sent to an internal team of reviewers who are using Amazon Augmented AI (Amazon A2I).Which solution will meet these requirements? [{"voted_answers": "B", "vote_count": 10, "is_most_voted": true}]
A machine learning (ML) engineer at a bank is building a data ingestion solution to provide transaction features to financial ML models. Raw transactional data is available in an Amazon Kinesis data stream.The solution must compute rolling averages of the ingested data from the data stream and must store the results in Amazon SageMaker Feature Store. The solution also must serve the results to the models in near real time.Which solution will meet these requirements? [{"voted_answers": "C", "vote_count": 7, "is_most_voted": true}]