[Coursera] Data Analysis with Python Quiz

Coursera / IBM Professional Certificate / Data Science

2021.09.12.

Quiz

📌 Each column contains a:

attribute or feature
different used car

📌 What description best describes the library Pandas?

Includes functions for some advanced math problems as listed in the slide as well as data visualization.
Offers data structure and tools for effective data manipulation and analysis. It provides fast access to structured data. The primary instrument of Pandas is a two-dimensional table consisting of columns and rows labels which are called a DataFrame. It is designed to provide an easy indexing function.
Uses arrays as their inputs and outputs. It can be extended to objects for matrices, and with a little change of coding, developers perform fast array processing.

📌 What task does the following lines of code perform?

path='C:\Windows\…\ automobile.csv'
df.to_csv(path)

Exports your Pandas dataframe to a new csv file, in the location specified by the variable path.
Loads a csv file.

📌 What does csv stand for?

Comma Separated Values
Car Sold values
none of the above

📌 What library is primarily used for machine learning

scikit-learn
Python
Matplotlib

📌 What task does the following command perform?

df.to_csv("A.csv")

change the name of the column to “A.csv”
load the data from a csv file called “A” into a dataframe
Save the dataframe df to a csv file called “A.csv”

📌 Consider the segment of the following dataframe: What is the type of the column make?

int64
float64
object

📌 How would you access the column “symboling” from the dataframe df?

df[“symboling”]
df==“symboling”

📌 What is the correct symbol for missing data?

nan
no-data

📌 How would you rename the column “city_mpg” to “city-L/100km”?

df.rename(columns={”city_mpg”: “city-L/100km”})
df.rename(columns={”city_mpg”: “city-L/100km”}, inplaice=True)

📌 Why do we convert values of Categorical Variables into numerical values?

To save memory
Most statistical models cannot take in objects or strings as inputs

📌 Consider the dataframe df; what method provides the summary statistics?

describe()
head()
tail()

📌 If we have 10 columns and 100 samples, how large is the output of df.corr()?

10 x 100
10x10
100x100

📌 If the p-value of the Pearson Correlation is 1, then …

The variables are correlated
The variables are not correlated
None of the above

📌 Consider the following dataframe:

df_test = df[['body-style', 'price']]

The following operation is applied:

df_grp = df_test.groupby(['body-style'], as_index=False).mean()

What are the resulting values of: df_grp[‘price’]?

The average price for each body style
The average price
The average body style

📌 What is the Pearson Correlation between variables X and Y, if X=-Y?

-1
1
0

📌 What does the following line of code do?

lm = LinearRegression()

Fit a regression object lm
Create a linear regression object
Predict a value

📌 What steps do the following lines of code perform?

Input=[('scale',StandardScaler()),('model',LinearRegression())]
pipe=Pipeline(Input)
pipe.fit(Z,y)
ypipe=pipe.predict(Z)

Standardize the data, then perform a polynomial transform on the features Z
Find the correlation between Z and y
Standardize the data, then perform a prediction using a linear regression model using the features Z and targets y

📌 If X is a dataframe with 100 rows and 5 columns, and y is the target with 100 samples, and assuming all the relevant libraries and data have been imported, and the following line of code has been executed:

LR = LinearRegression()
LR.fit(X, y)
yhat = LR.predict(X)

How many samples does ㅅhat contain?

📌 What value of R^2 (coefficient of determination) indicates your model performs best?

-1
1
0

📌 Consider the following equation:

Y = b0 + b1x

What is the parameter b_0 (b subscript 0)?

The predictor or independent variable
The target or dependent variable
The intercept
The slope

📌 What is the output of the following code?

cross_val_predict (lr2e, x_data, y_data, cv=3)

The predicted values of the test data using cross-validation
The average R^2 on the test data for each of the two folds
This function finds the free parameter alpha

📌 What dictionary value would we use to perform a grid search for the following values of alpha? 1, 10, 100 No other parameter values should be tested

alpha=[1,10,100]
[{‘alpha’: [1,10,100]}]
[{‘alpha’: [0.001,0.1,1, 10, 100, 1000,10000,100000,100000],‘normalize’:[True,False]} ]

📌 You have a linear model; the average R^2 value on your training data is 0.5, you perform a 100th order polynomial transform on your data then use these values to train another model. After this step, your average R^2 is 0.99; which of the following comments is correct?

You should always use the simplest model
100-th order polynomial will work better on unseen data
The results on your training data is not the best indicator of how your model performs; you should use your test data to get a better idea

📌 What type of file allows data to be saved in a tabular format?

html
pdf
csv

📌 What Python libraries were considered “Algorithmic Libraries” in this course?

Pandas, Numpy, SciPy
Scikit-learn, Statsmodels
Matplotlib, Seaborn

📌 What path tells us where the data is stored?

Scheme path
File path
Encoding path

📌 What does the head() method return?

It returns the data types of each column
It returns the last five rows
It returns the first five rows

📌 The Pandas library allows us to read what?

Only headers
Various datasets into a data frame
Only rows

📌 The Pandas library is mostly used for what?

Data analysis
Machine learning
Data visualization

📌 What would the following code segment output from a dataframe df? df.head(5)

It would return the first 5 rows of the dataframe
It would return the last 5 rows of the dataframe
It would return all of the rows of the dataframe

📌 What does the following code segment perform in a dataframe?

mean = df["normalized-losses"].mean() df["normalized-losses"].replace(np.nan, mean)

It replaces the missing values in the column “normalized-losses” with the mean of that column
It drops rows that contain missing values
It drops all of the rows in the column “normalized-losses”

📌 How would you multiply each element in the column df[“c”] by 5 and assign it back to the column df[“c”]?

5 * df[“b”]
df[“c”] = 5 * df[“c”]
df[“a”] = df[“c”] * 5

📌 What function returns the maximum of the values requested for the requested column?

max()
std()
min()

📌 Since most statistical models cannot take objects or strings as inputs, what action needs to be performed?

Convert numerical values into categorical variables
Convert categorical variables into numerical values

📌 What function will change the name of a column in a dataframe?

rename()
replace()
exchange()