Practice Set 16
Questions 151–160 (10 questions)
A data science team is planning to build a natural language processing (NLP) application. The application's text preprocessing stage will include part-of-speech tagging and key phase extraction. The preprocessed text will be input to a custom classification algorithm that the data science team has already written and trained using Apache MXNet.Which solution can the team build MOST quickly to meet these requirements? [{"voted_answers": "D", "vote_count": 32, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 17, "is_most_voted": false}]
A machine learning (ML) specialist must develop a classification model for a financial services company. A domain expert provides the dataset, which is tabular with 10,000 rows and 1,020 features. During exploratory data analysis, the specialist finds no missing values and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50th percentile.Which feature engineering strategy should the ML specialist use with Amazon SageMaker? [{"voted_answers": "A", "vote_count": 19, "is_most_voted": true}]
A manufacturing company asks its machine learning specialist to develop a model that classifies defective parts into one of eight defect types. The company has provided roughly 100,000 images per defect type for training. During the initial training of the image classification model, the specialist notices that the validation accuracy is 80%, while the training accuracy is 90%. It is known that human-level performance for this type of image classification is around 90%.What should the specialist consider to fix this issue? [{"voted_answers": "D", "vote_count": 9, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 1, "is_most_voted": false}]
A machine learning specialist needs to analyze comments on a news website with users across the globe. The specialist must find the most discussed topics in the comments that are in either English or Spanish.What steps could be used to accomplish this task? (Choose two.) [{"voted_answers": "C", "vote_count": 15, "is_most_voted": true}, {"voted_answers": "E", "vote_count": 11, "is_most_voted": false}, {"voted_answers": "B", "vote_count": 10, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]
A machine learning (ML) specialist is administering a production Amazon SageMaker endpoint with model monitoring configured. Amazon SageMaker ModelMonitor detects violations on the SageMaker endpoint, so the ML specialist retrains the model with the latest dataset. This dataset is statistically representative of the current production traffic. The ML specialist notices that even after deploying the new SageMaker model and running the first monitoring job, the SageMaker endpoint still has violations.What should the ML specialist do to resolve the violations? [{"voted_answers": "B", "vote_count": 9, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]
A company supplies wholesale clothing to thousands of retail stores. A data scientist must create a model that predicts the daily sales volume for each item for each store. The data scientist discovers that more than half of the stores have been in business for less than 6 months. Sales data is highly consistent from week to week. Daily data from the database has been aggregated weekly, and weeks with no sales are omitted from the current dataset. Five years (100 MB) of sales data is available in Amazon S3.Which factors will adversely impact the performance of the forecast model to be developed, and which actions should the data scientist take to mitigate them?(Choose two.) [{"voted_answers": "AC", "vote_count": 36, "is_most_voted": true}, {"voted_answers": "CD", "vote_count": 10, "is_most_voted": false}, {"voted_answers": "AD", "vote_count": 6, "is_most_voted": false}]
An ecommerce company is automating the categorization of its products based on images. A data scientist has trained a computer vision model using the AmazonSageMaker image classification algorithm. The images for each product are classified according to specific product lines. The accuracy of the model is too low when categorizing new products. All of the product images have the same dimensions and are stored within an Amazon S3 bucket. The company wants to improve the model so it can be used for new products as soon as possible.Which steps would improve the accuracy of the solution? (Choose three.) [{"voted_answers": "CDF", "vote_count": 16, "is_most_voted": true}, {"voted_answers": "CEF", "vote_count": 13, "is_most_voted": false}, {"voted_answers": "BCE", "vote_count": 1, "is_most_voted": false}]
A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. There are 5 classes in the dataset, with 300 samples for category A, 292 samples for category B, 240 samples for category C, 258 samples for category D, and 310 samples for category E.The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets.What could the data scientist conclude form these results? [{"voted_answers": "A", "vote_count": 13, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 3, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]
A company that manufactures mobile devices wants to determine and calibrate the appropriate sales price for its devices. The company is collecting the relevant data and is determining data features that it can use to train machine learning (ML) models. There are more than 1,000 features, and the company wants to determine the primary features that contribute to the sales price.Which techniques should the company use for feature selection? (Choose three.) [{"voted_answers": "BDE", "vote_count": 35, "is_most_voted": true}]
A power company wants to forecast future energy consumption for its customers in residential properties and commercial business properties. Historical power consumption data for the last 10 years is available. A team of data scientists who performed the initial data analysis and feature selection will include the historical power consumption data and data such as weather, number of individuals on the property, and public holidays.The data scientists are using Amazon Forecast to generate the forecasts.Which algorithm in Forecast should the data scientists use to meet these requirements? [{"voted_answers": "C", "vote_count": 30, "is_most_voted": true}, {"voted_answers": "A", "vote_count": 3, "is_most_voted": false}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]