15 Python Coding Interview Questions You Must Know For Data Science

Harvard Business Review referred to data scientist as the “Sexiest Job of the 21st Century.” Glassdoor placed it #1 on the 25 Best Jobs in America list. According to IBM, demand for this role will soar 28 percent by 2020. It should come as no surprise that in the new era of big data and machine learning, data scientists are becoming rock stars. Companies that are able to leverage massive amounts of data to improve the way they serve customers, build products, and run their operations will be positioned to thrive in this economy.

And if you’re moving down the path to becoming a data scientist, you must be prepared to impress prospective employers with your knowledge. And to do that you must be able to crack your next data science interview in one go! We have clubbed a list of the most popular data science interview questions you can expect in your next interview!

Coding Interview for Data Scientists | Python Questions | Data Science Interview

24. You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 96 percent. Why shouldn’t you be happy with your model performance? What can you do about it?

Cancer detection results in imbalanced data. In an imbalanced dataset, accuracy should not be based as a measure of performance. It is important to focus on the remaining four percent, which represents the patients who were wrongly diagnosed. Early diagnosis is crucial when it comes to cancer detection, and can greatly improve a patients prognosis.

Hence, to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine the class wise performance of the classifier.

1. What are the differences between supervised and unsupervised learning?

Supervised Learning

Unsupervised Learning

  • Uses known and labeled data as input
  • Supervised learning has a feedback mechanismÂ
  • The most commonly used supervised learning algorithms are decision trees, logistic regression, and support vector machine
  • Uses unlabeled data as input
  • Unsupervised learning has no feedback mechanismÂ
  • The most commonly used unsupervised learning algorithms are k-means clustering, hierarchical clustering, and apriori algorithm

data science coding interview questions

23. Write a basic SQL query that lists all orders with customer information.

Usually, we have order tables and customer tables that contain the following columns:

  • Order TableÂ
  • Orderid
  • customerIdÂ
  • OrderNumber
  • TotalAmount
  • Customer TableÂ
  • Id
  • FirstName
  • LastName
  • CityÂ
  • Country Â
  • The SQL query is:
  • SELECT OrderNumber, TotalAmount, FirstName, LastName, City, Country
  • FROM Order
  • JOIN Customer
  • ON Order.CustomerId = Customer.Id
  • data science coding interview questions

    9. You are given a data set consisting of variables with more than 30 percent missing values. How will you deal with them?

    The following are ways to handle missing data values:

    If the data set is large, we can just simply remove the rows with missing data values. It is the quickest way; we use the rest of the data to predict the values.

    For smaller data sets, we can substitute missing values with the mean or average of the rest of the data using the pandas data frame in python. There are different ways to do so, such as df.mean(), df.fillna(mean).

    data science coding interview questions

    Related Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *