Harvard Business Review referred to data scientist as the âSexiest Job of the 21st Century.â Glassdoor placed it #1 on the 25 Best Jobs in America list. According to IBM, demand for this role will soar 28 percent by 2020. It should come as no surprise that in the new era of big data and machine learning, data scientists are becoming rock stars. Companies that are able to leverage massive amounts of data to improve the way they serve customers, build products, and run their operations will be positioned to thrive in this economy.
And if youâre moving down the path to becoming a data scientist, you must be prepared to impress prospective employers with your knowledge. And to do that you must be able to crack your next data science interview in one go! We have clubbed a list of the most popular data science interview questions you can expect in your next interview!
Coding Interview for Data Scientists | Python Questions | Data Science Interview
24. You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 96 percent. Why shouldn’t you be happy with your model performance? What can you do about it?
Cancer detection results in imbalanced data. In an imbalanced dataset, accuracy should not be based as a measure of performance. It is important to focus on the remaining four percent, which represents the patients who were wrongly diagnosed. Early diagnosis is crucial when it comes to cancer detection, and can greatly improve a patients prognosis.
Hence, to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine the class wise performance of the classifier.
1. What are the differences between supervised and unsupervised learning?
Supervised Learning |
Unsupervised Learning |
---|---|
|
|
23. Write a basic SQL query that lists all orders with customer information.
Usually, we have order tables and customer tables that contain the following columns:
9. You are given a data set consisting of variables with more than 30 percent missing values. How will you deal with them?
The following are ways to handle missing data values:
If the data set is large, we can just simply remove the rows with missing data values. It is the quickest way; we use the rest of the data to predict the values.
For smaller data sets, we can substitute missing values with the mean or average of the rest of the data using the pandas data frame in python. There are different ways to do so, such as df.mean(), df.fillna(mean).