Great resources 21 Must-Know Data Science Interview Questions and Answers 109 Data Science Interview Questions and Answers

  • Merge k arrays and sort them
  • How best to select a representative sample of search queries from 5 million?
  • Three friends told you it’s rainy. Each has a probability of 1/3 of lying. What is the probability of being rainy?
  • Can you explain the fundamentals of Naive Bayes? How do you set the threshold?
  • Can you explain what MapReduce is and how it works?
  • Can you explain SVM?
    • SVM is a maximal margin classifier
  • How do you detect if a new observation is outlier? What is a bias-variance trade off?
  • Discuss how to randomly select a sample from a product user population.
  • How do you implement autocomplete?
  • Describe the working of gradient boost. +
  • Find the maximum of sub sequence in an integer list.
  • What would you do to summarize a twitter feed.
  • Explain the steps of data wrangling and cleaning before appling ML algorithms.
  • How to deal with unbalanced binary classification?
  • How to measure distance between data point?
    • Distance between data points
  • Define variance.
    • Variance is the expectation of the squared deviation of a random variable from its mean
  • What is the difference between box plot and histogram.
  • How do you solve the L2 regularized regression problem?
  • How to compute an inverse matrix faster by playing around with some computational tricks?
  • How to perform a series of calculations without a calculator?
  • What is the difference between good and bad data visualization? +
  • How do you find percentile? Write the code.
  • Find max sum subsequence from a sequence of values.
  • What are the different regularization metrics L1 and L2.
  • Write a function that checks if a word is a palindrome.
  • Describe Binary Classification
  • Calculate AUC of an ROC curve
  • How do you use A/B testing?
  • Write function to return value samples from normal distribution using a random Bernoulli trial Generator
  • What does P-Value mean?
  • Explain Linear Regression, assumptions and math equations
  • Define CLT and how is it relevant for Uber?
  • Explain Logistic Regression, assumptions and math equations
  • How much would it cost to have a fleet of vehicles take street view photos of every major city of US?
  • How to model cost of renting cars to drivers?
  • Explain how surge pricing algorithm works and how to test which strategy works better?
  • What is cross validation?
  • How do network effects influence choice to define experiments and measure outcomes?
  • What are anomaly detection methods?
  • How does driving condition and congestion impact Uber revenue?
  • How does driving condition and congestion impact Uber revenue or rider experience?
  • How does caching work and how do you use it in Data science?
  • How to optimize marketing spend between various marketing channels?
  • How to calculate radius for Uber Pool in a city?
  • How to decide if a location should be included in Uber Pool?
  • What are time series forecasting techniques?
  • Explain PCA, assumptions, equations.
  • Does Uber cause traffic congestion?
  • There is a building with 100 floors. You are given 2 identical eggs. How do you use 2 eggs to find the threshold floor, where the egg will definitely break from any floor above floor N, including floor N itself.
  • You randomly draw a coin from 100 coins — 1 unfair coin (head-head), 99 fair coins (head-tail) and roll it 10 times. If the result is 10 heads, what is the probability that the coin is unfair?
  • Write a sorting algorithm for a numerical dataset in Python.
  • Facebook would like to develop a way to estimate the month and day of people’s birthdays, regardless of whether people give us that information directly. What methods would you propose, and data would you use, to help with that task?
  • Use a python built-in package to manipulate ‘csv’ data.
  • How would you compare the relative performance of two different back-end engines for automated generation of Facebook “Friend” suggestions?
  • Given a KPI, choose the right metric, perform ETL. (using SQL/Code)
  • You’re about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it’s raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that “Yes” it is raining. What is the probability that it’s actually raining in Seattle?
  • Consider a game with 2 players, A and B. Player A has 8 stones, player B has 6. Game proceeds as follows. First, A rolls a fair 6-sided die, and the number on the die determines how many stones A takes over from B. Next, B rolls the same die, and the exact same thing happens in reverse. This concludes the round. Whoever has more stones at the end of the round wins and the game is over. If players end up with equal # of stones at the end of the round, it is a tie and another round ensues. What is the probability that B wins in 1, 2, …, n rounds?
  • How do you get the count of each letter in a sentence?
  • How do you prove that males are on average taller than females by knowing just gender or height?
  • What is a monkey patch?
  • Given a list A of objects and another list B which is identical to A except that one element is removed, find the removed element.
  • Given a list of integers (positive & negative), write an algorithm to find whether there’s at least a pair of integers that sum up to zero. How would you improve your algorithm’s performance?
  • Make a histogram of 2 variables.
  • Build a histogram of post reply count in SQL (number of posts with x replies, x+1 replies, etc).
  • Build a table with a summary of feature usage per user every day (keep track of the last action by user and roll up every day).
  • You’re at a casino with two dice, if you roll a 5 you win, and get paid $10. What is your expected payout? If you play until you win (however long that takes) then stop, what is your expected payout?
  • What metric would you show small businesses if you were trying to have them sign up for Facebook Ads?
  • Given a table of friend requests sent and friend requests received, find the user with the most friends.
  • Likes/user and minutes spent on a platform are increasing but total number of users are decreasing. What could be the root cause of it?
  • How many high schools that people have listed on their profiles are real? How do we find out, and deploy at scale, a way of finding invalid schools?
  • How do you map nicknames (Pete, Andy, Nick, Rob, etc) to real names?
  • Facebook sees that likes are up 10% year over year, why could this be?
  • If a PM says that they want to double the number of ads in Newsfeed, how would you figure out if this is a good idea or not?
  • How to design a customer satisfaction survey?
  • Tossing a coin ten times resulted in 8 heads and 2 tails. How would you analyze whether a coin is fair? What is the p-value?
  • You have 10 coins. You toss each coin 10 times (100 tosses in total) and observe results. Would you modify your approach to the the way you test the fairness of coins?
  • Explain a probability distribution that is not normal and how to apply that?
  • Why use feature selection? If two predictors are highly correlated, what is the effect on the coefficients in the logistic regression? What are the confidence intervals of the coefficients?
  • K- mean and Gaussian mixture model: what is the difference between K-means and EM? When using Gaussian mixture model, how do you know it is applicable? (Normal distribution)
  • If the labels are known in the clustering project, how to how to evaluate the performance of the model?
  • You have a google app and you make a change. How do you test if a metric has increased or not?
  • Describe the process of data analysis?
  • Why not logistic regression, why GBM?
  • Derive the equations for GMM.
  • How would you measure how much users liked videos?
  • Simulate a bivariate normal
  • Derive variance of a distribution
  • How many people apply to Google per year?
  • How do you build estimators for medians?
  • If each of the two coefficient estimates in a regression model is statistically significant, do you expect the test of both together is still significant?

  • Why would you use random forests vs SVM and why?
  • How can you make an unfair coin fair?
  • In SQL, what’s the difference between a primary key, a candidate key, a foreign key, and a super key?
  • How can you deal with unbalanced binary classification?
  • What is the difference between convex and non-convex cost function?
  • What are the advantages of dropout layers and how do they work?
  • What’s the difference between a clustered and non-clustered index? What are the advantages of each?
  • What are some of the steps for data wrangling and data cleaning before applying machine learning algorithms?
  • What kinds of methods can you use in order to analyse the topics over a set of documents?
  • When would you use linear regression vs multiple regression? How can multiple regression models be better or worse than linear regression models?
    • In multiple regression, collinearity might exist,
  • What can joins in SQL be used for? What’s the difference between a left join, an inner join, and a full outer join?
  • Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
    • A long tail distribution has tails that taper off gradually rather than drop off sharply; a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
    • Examples include city population sizes, size of reserves in a certain geological region, size of companies. +
  • What is an example of a probability distribution that has a finite mean, but infinite variance?
    • Pareto distribution
  • What is data scrubbing? What’s one way to scrub data?
    • Data scrubbing refers to the procedure of modifying or removing incomplete, incorrect, inaccurately formatted, or repeated data in a database. The key objective of data scrubbing is to make the data more accurate and consistent.
  • What is a Markov Decision Process and what is the goal for the decision maker? Which kinds of policies are the easiest to store and learn?
  • What is ensemble learning and what is the difference between bagging and boosting?
  • What is Occam’s Razor and how is stated in ML Theory?
  • What is a DBMS? What is a RDMS? How do they differ from each another?
  • What are Lasso, Ridge, and Elastic net regularizations?
  • What is exploitation-exploration? And what is the armed bandit method?
  • What are examples of linear and non-linear dimensionality reduction techniques?
  • What is PAC Learnability and how does it relate to Uniform Convergence? Can you explain the difference between Agnostic PAC Learning vs Realizable PAC Learning?
  • Explain what precision and recall are. How do they relate to the ROC curve?
  • What are hash table collisions? Why do they happen? Explain how you might resolve one.
  • What is an example of a non-parametric clustering method and how does it work?
  • What are examples of times when false positives are more important than false negatives? What about when false negatives are more important? When are false positives and false negatives equally important?
  • What are the differences between SQL and PL/SQL? What is a database management system?
  • Define a SQL query. What is the difference between SELECT and UPDATE Query? How do you use SQL in SAS, Python, R languages?
  • How would you perform clustering on a million unique keywords, assuming you have 10 million data points, each one consisting of two keywords, and create a metric measuring how similar these two keywords are? How would you create this 10 million data points table in the first place?
  • What is the Markov Property and why is it useful for modeling systems?
    • The Markov Property expresses the fact that at a given time step and knowing the current state, we won’t get any additional information about the future by gathering information about the past. +
  • What is PAC learnability and what is a sufficient condition for an ERM to PAC learn?
  • Write a mock SQL Query to find the second highest salary of Employee in a table where Salary and Employee ID are given.
  • What are the assumptions and uses of logistic regression and how do they differ from those of linear regression?
  • How would you write a regex in order to see if a word began with a vowel in an SQL Query?
  • What assumptions are needed for a linear regression? Are they the same for a logistic regression? How would you test the significance of the parameters for a linear regression? Would you use the same test in the case of a logistic regression?
  • What is the hardest challenge you have ever confronted and how did you improve afterwards?
  • What is the best way to combat biases in your algorithms?
  • What is a dimensionality reduction method and clearly explain how it works (e.g. What does a principle component in PCA represent)?
    • A principle component represents the direction of maximum variance in the data.
  • Explain your favorite machine learning algorithm and how exactly it works (e.g. Lloyd’s algorithm for K-Means, Decision Tree Construction).
  • Explain what is the difference between the weak and strong versions of the law of large numbers and the central limit theorem. When are they typically used? What are their requirements to be used?
    • Weak strong version
  • Given a n*n matrix where all numbers are distinct, describe an algorithm that can find the maximum length path (starting from any cell) such that all cells along the path are in increasing order with a difference of 1: we can move in 4 directions from a given cell (i, j), i.e., we can move to (i+1, j) or (i, j+1) or (i-1, j) or (i, j-1) with the condition that the adjacent cells have a difference of 1. How many paths exist in total (side question)?
  • What are the support vectors in a SVM (support vector machine)?
  • Give an example of three random events X, Y, Z for which any pair are independent but all three are not mutually independent.
  • How can you determine the optimal number of clusters in an unsupervised learning problem (give at least two examples)?
  • If there are 8 marbles of equal weight and 1 marble that weighs a little bit more (for a total of 9 marbles), how many weighings are required to determine which marble is the heaviest?

References:

25 Microsoft AI Interview Questions