11

Practice Set 11

Questions 101–110 (10 questions)

101

A technology startup is using complex deep neural networks and GPU compute to recommend the company's products to its existing customers based upon each customer's habits and interactions. The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a TensorFlow model pulled from the company's Git repository that runs locally. This job then runs for several hours while continually outputting its progress to the same S3 bucket. The job can be paused, restarted, and continued at any time in the event of a failure, and is run from a central queue.Senior managers are concerned about the complexity of the solution's resource management and the costs involved in repeating the process regularly. They ask for the workload to be automated so it runs once a week, starting Monday and completing by the close of business Friday.Which architecture should be used to scale the solution at the lowest cost? [{"voted_answers": "A", "vote_count": 13, "is_most_voted": true}, {"voted_answers": "B", "vote_count": 3, "is_most_voted": false}, {"voted_answers": "C", "vote_count": 2, "is_most_voted": false}]

102

A Machine Learning Specialist prepared the following graph displaying the results of k-means for k = [1..10]:Considering the graph, what is a reasonable selection for the optimal choice of k? [{"voted_answers": "B", "vote_count": 11, "is_most_voted": true}]

103

A media company with a very large archive of unlabeled images, text, audio, and video footage wishes to index its assets to allow rapid identification of relevant content by the Research team. The company wants to use machine learning to accelerate the efforts of its in-house researchers who have limited machine learning expertise.Which is the FASTEST route to index the assets? [{"voted_answers": "A", "vote_count": 5, "is_most_voted": true}]

104

A Machine Learning Specialist is working for an online retailer that wants to run analytics on every customer visit, processed through a machine learning pipeline.The data needs to be ingested by Amazon Kinesis Data Streams at up to 100 transactions per second, and the JSON data blob is 100 KB in size.What is the MINIMUM number of shards in Kinesis Data Streams the Specialist should use to successfully ingest this data? [{"voted_answers": "B", "vote_count": 4, "is_most_voted": true}]

105

A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for a classification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds that their absolute values range between 0.1 to 0.95.Which model describes the underlying data in this situation? [{"voted_answers": "D", "vote_count": 9, "is_most_voted": true}]

106

A Data Scientist is building a linear regression model and will use resulting p-values to evaluate the statistical significance of each coefficient. Upon inspection of the dataset, the Data Scientist discovers that most of the features are normally distributed. The plot of one feature in the dataset is shown in the graphic.What transformation should the Data Scientist apply to satisfy the statistical assumptions of the linear regression model? [{"voted_answers": "B", "vote_count": 7, "is_most_voted": true}]

107

A Machine Learning Specialist is assigned to a Fraud Detection team and must tune an XGBoost model, which is working appropriately for test data. However, with unknown data, it is not working as expected. The existing parameters are provided as follows.Which parameter tuning guidelines should the Specialist follow to avoid overfitting? [{"voted_answers": "B", "vote_count": 3, "is_most_voted": true}]

108

A data scientist is developing a pipeline to ingest streaming web traffic data. The data scientist needs to implement a process to identify unusual web traffic patterns as part of the pipeline. The patterns will be used downstream for alerting and incident response. The data scientist has access to unlabeled historic data to use, if needed.The solution needs to do the following:✑ Calculate an anomaly score for each web traffic entry.Adapt unusual event identification to changing web patterns over time.Which approach should the data scientist implement to meet these requirements? [{"voted_answers": "D", "vote_count": 13, "is_most_voted": true}]

109

A Data Scientist received a set of insurance records, each consisting of a record ID, the final outcome among 200 categories, and the date of the final outcome.Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records distributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance.What type of machine learning model should be used? [{"voted_answers": "C", "vote_count": 9, "is_most_voted": true}]

110

A company that promotes healthy sleep patterns by providing cloud-connected devices currently hosts a sleep tracking application on AWS. The application collects device usage information from device users. The company's Data Science team is building a machine learning model to predict if and when a user will stop utilizing the company's devices. Predictions from this model are used by a downstream application that determines the best approach for contacting users.The Data Science team is building multiple versions of the machine learning model to evaluate each version against the company's business goals. To measure long-term effectiveness, the team wants to run multiple versions of the model in parallel for long periods of time, with the ability to control the portion of inferences served by the models.Which solution satisfies these requirements with MINIMAL effort? [{"voted_answers": "B", "vote_count": 8, "is_most_voted": true}, {"voted_answers": "D", "vote_count": 1, "is_most_voted": false}]