Quiz
📌 Each column contains a:
- attribute or feature
- different used car
📌 What description best describes the library Pandas?
- Includes functions for some advanced math problems as listed in the slide as well as data visualization.
- Offers data structure and tools for effective data manipulation and analysis. It provides fast access to structured data. The primary instrument of Pandas is a two-dimensional table consisting of columns and rows labels which are called a DataFrame. It is designed to provide an easy indexing function.
- Uses arrays as their inputs and outputs. It can be extended to objects for matrices, and with a little change of coding, developers perform fast array processing.
📌 What task does the following lines of code perform?
path='C:\Windows\…\ automobile.csv'
df.to_csv(path)
-
Exports your Pandas dataframe to a new csv file, in the location specified by the variable path.
-
Loads a csv file.
📌 What does csv stand for?
- Comma Separated Values
- Car Sold values
- none of the above
📌 What library is primarily used for machine learning
- scikit-learn
- Python
- Matplotlib
📌 What task does the following command perform?
df.to_csv("A.csv")
-
change the name of the column to “A.csv”
-
load the data from a csv file called “A” into a dataframe
-
Save the dataframe df to a csv file called “A.csv”
📌 Consider the segment of the following dataframe:
What is the type of the column make?
- int64
- float64
- object
📌 How would you access the column “symboling” from the dataframe df?
- df[“symboling”]
- df==“symboling”
📌 What is the correct symbol for missing data?
- nan
- no-data
📌 How would you rename the column “city_mpg” to “city-L/100km”?
- df.rename(columns={”city_mpg”: “city-L/100km”})
- df.rename(columns={”city_mpg”: “city-L/100km”}, inplaice=True)
📌 Why do we convert values of Categorical Variables into numerical values?
- To save memory
- Most statistical models cannot take in objects or strings as inputs
📌 Consider the dataframe df; what method provides the summary statistics?
- describe()
- head()
- tail()
📌 If we have 10 columns and 100 samples, how large is the output of df.corr()?
- 10 x 100
- 10x10
- 100x100
📌 If the p-value of the Pearson Correlation is 1, then …
- The variables are correlated
- The variables are not correlated
- None of the above
📌 Consider the following dataframe:
df_test = df[['body-style', 'price']]
The following operation is applied:
df_grp = df_test.groupby(['body-style'], as_index=False).mean()
What are the resulting values of: df_grp[‘price’]?
- The average price for each body style
- The average price
- The average body style
📌 What is the Pearson Correlation between variables X and Y, if X=-Y?
- -1
- 1
- 0
📌 What does the following line of code do?
lm = LinearRegression()
-
Fit a regression object lm
-
Create a linear regression object
-
Predict a value
📌 What steps do the following lines of code perform?
Input=[('scale',StandardScaler()),('model',LinearRegression())]
pipe=Pipeline(Input)
pipe.fit(Z,y)
ypipe=pipe.predict(Z)
-
Standardize the data, then perform a polynomial transform on the features Z
-
Find the correlation between Z and y
-
Standardize the data, then perform a prediction using a linear regression model using the features Z and targets y
📌 If X is a dataframe with 100 rows and 5 columns, and y is the target with 100 samples, and assuming all the relevant libraries and data have been imported, and the following line of code has been executed:
LR = LinearRegression()
LR.fit(X, y)
yhat = LR.predict(X)
How many samples does ㅅhat contain?
- 500
- 5
- 100
📌 What value of R^2 (coefficient of determination) indicates your model performs best?
- -1
- 1
- 0
📌 Consider the following equation:
Y = b0 + b1x
What is the parameter b_0 (b subscript 0)?
- The predictor or independent variable
- The target or dependent variable
- The intercept
- The slope
📌 What is the output of the following code?
cross_val_predict (lr2e, x_data, y_data, cv=3)
-
The predicted values of the test data using cross-validation
-
The average R^2 on the test data for each of the two folds
-
This function finds the free parameter alpha
📌 What dictionary value would we use to perform a grid search for the following values of alpha? 1, 10, 100
No other parameter values should be tested
- alpha=[1,10,100]
- [{‘alpha’: [1,10,100]}]
- [{‘alpha’: [0.001,0.1,1, 10, 100, 1000,10000,100000,100000],‘normalize’:[True,False]} ]
📌 You have a linear model; the average R^2 value on your training data is 0.5, you perform a 100th order polynomial transform on your data then use these values to train another model. After this step, your average R^2 is 0.99; which of the following comments is correct?
- You should always use the simplest model
- 100-th order polynomial will work better on unseen data
- The results on your training data is not the best indicator of how your model performs; you should use your test data to get a better idea
📌 What type of file allows data to be saved in a tabular format?
- html
- csv
📌 What Python libraries were considered “Algorithmic Libraries” in this course?
- Pandas, Numpy, SciPy
- Scikit-learn, Statsmodels
- Matplotlib, Seaborn
📌 What path tells us where the data is stored?
- Scheme path
- File path
- Encoding path
📌 What does the head() method return?
- It returns the data types of each column
- It returns the last five rows
- It returns the first five rows
📌 The Pandas library allows us to read what?
- Only headers
- Various datasets into a data frame
- Only rows
📌 The Pandas library is mostly used for what?
- Data analysis
- Machine learning
- Data visualization
📌 What would the following code segment output from a dataframe df? df.head(5)
- It would return the first 5 rows of the dataframe
- It would return the last 5 rows of the dataframe
- It would return all of the rows of the dataframe
📌 What does the following code segment perform in a dataframe?
mean = df["normalized-losses"].mean() df["normalized-losses"].replace(np.nan, mean)
-
It replaces the missing values in the column “normalized-losses” with the mean of that column
-
It drops rows that contain missing values
-
It drops all of the rows in the column “normalized-losses”
📌 How would you multiply each element in the column df[“c”] by 5 and assign it back to the column df[“c”]?
- 5 * df[“b”]
- df[“c”] = 5 * df[“c”]
- df[“a”] = df[“c”] * 5
📌 What function returns the maximum of the values requested for the requested column?
- max()
- std()
- min()
📌 Since most statistical models cannot take objects or strings as inputs, what action needs to be performed?
- Convert numerical values into categorical variables
- Convert categorical variables into numerical values
📌 What function will change the name of a column in a dataframe?
- rename()
- replace()
- exchange()