[Coursera] Data Analysis with Python Quiz
Coursera / IBM Professional Certificate / Data Science
2021.09.12.

Quiz

📌 Each column contains a:

  • attribute or feature
  • different used car


📌 What description best describes the library Pandas?

  • Includes functions for some advanced math problems as listed in the slide as well as data visualization.
  • Offers data structure and tools for effective data manipulation and analysis. It provides fast access to structured data. The primary instrument of Pandas is a two-dimensional table consisting of columns and rows labels which are called a DataFrame. It is designed to provide an easy indexing function.
  • Uses arrays as their inputs and outputs. It can be extended to objects for matrices, and with a little change of coding, developers perform fast array processing.


📌 What task does the following lines of code perform?

path='C:\Windows\…\ automobile.csv'
df.to_csv(path)
  • Exports your Pandas dataframe to a new csv file, in the location specified by the variable path.

  • Loads a csv file.


📌 What does csv stand for?

  • Comma Separated Values
  • Car Sold values
  • none of the above


📌 What library is primarily used for machine learning

  • scikit-learn
  • Python
  • Matplotlib


📌 What task does the following command perform?

df.to_csv("A.csv")
  • change the name of the column to “A.csv”

  • load the data from a csv file called “A” into a dataframe

  • Save the dataframe df to a csv file called “A.csv”


📌 Consider the segment of the following dataframe: What is the type of the column make?

  • int64
  • float64
  • object


📌 How would you access the column “symboling” from the dataframe df?

  • df[“symboling”]
  • df==“symboling”


📌 What is the correct symbol for missing data?

  • nan
  • no-data


📌 How would you rename the column “city_mpg” to “city-L/100km”?

  • df.rename(columns={”city_mpg”: “city-L/100km”})
  • df.rename(columns={”city_mpg”: “city-L/100km”}, inplaice=True)


📌 Why do we convert values of Categorical Variables into numerical values?

  • To save memory
  • Most statistical models cannot take in objects or strings as inputs


📌 Consider the dataframe df; what method provides the summary statistics?

  • describe()
  • head()
  • tail()


📌 If we have 10 columns and 100 samples, how large is the output of df.corr()?

  • 10 x 100
  • 10x10
  • 100x100


📌 If the p-value of the Pearson Correlation is 1, then …

  • The variables are correlated
  • The variables are not correlated
  • None of the above


📌 Consider the following dataframe:

df_test = df[['body-style', 'price']]

The following operation is applied:

df_grp = df_test.groupby(['body-style'], as_index=False).mean()

What are the resulting values of: df_grp[‘price’]?

  • The average price for each body style
  • The average price
  • The average body style


📌 What is the Pearson Correlation between variables X and Y, if X=-Y?

  • -1
  • 1
  • 0


📌 What does the following line of code do?

lm = LinearRegression()
  • Fit a regression object lm

  • Create a linear regression object

  • Predict a value


📌 What steps do the following lines of code perform?

Input=[('scale',StandardScaler()),('model',LinearRegression())]
pipe=Pipeline(Input)
pipe.fit(Z,y)
ypipe=pipe.predict(Z)
  • Standardize the data, then perform a polynomial transform on the features Z

  • Find the correlation between Z and y

  • Standardize the data, then perform a prediction using a linear regression model using the features Z and targets y


📌 If X is a dataframe with 100 rows and 5 columns, and y is the target with 100 samples, and assuming all the relevant libraries and data have been imported, and the following line of code has been executed:

LR = LinearRegression()
LR.fit(X, y)
yhat = LR.predict(X)

How many samples does ㅅhat contain?

  • 500
  • 5
  • 100


📌 What value of R^2 (coefficient of determination) indicates your model performs best?

  • -1
  • 1
  • 0


📌 Consider the following equation:

Y = b0 + b1x

What is the parameter b_0 (b subscript 0)?

  • The predictor or independent variable
  • The target or dependent variable
  • The intercept
  • The slope


📌 What is the output of the following code?

cross_val_predict (lr2e, x_data, y_data, cv=3)
  • The predicted values of the test data using cross-validation

  • The average R^2 on the test data for each of the two folds

  • This function finds the free parameter alpha


📌 What dictionary value would we use to perform a grid search for the following values of alpha? 1, 10, 100 No other parameter values should be tested

  • alpha=[1,10,100]
  • [{‘alpha’: [1,10,100]}]
  • [{‘alpha’: [0.001,0.1,1, 10, 100, 1000,10000,100000,100000],‘normalize’:[True,False]} ]


📌 You have a linear model; the average R^2 value on your training data is 0.5, you perform a 100th order polynomial transform on your data then use these values to train another model. After this step, your average R^2 is 0.99; which of the following comments is correct?

  • You should always use the simplest model
  • 100-th order polynomial will work better on unseen data
  • The results on your training data is not the best indicator of how your model performs; you should use your test data to get a better idea


📌 What type of file allows data to be saved in a tabular format?

  • html
  • pdf
  • csv


📌 What Python libraries were considered “Algorithmic Libraries” in this course?

  • Pandas, Numpy, SciPy
  • Scikit-learn, Statsmodels
  • Matplotlib, Seaborn


📌 What path tells us where the data is stored?

  • Scheme path
  • File path
  • Encoding path


📌 What does the head() method return?

  • It returns the data types of each column
  • It returns the last five rows
  • It returns the first five rows


📌 The Pandas library allows us to read what?

  • Only headers
  • Various datasets into a data frame
  • Only rows


📌 The Pandas library is mostly used for what?

  • Data analysis
  • Machine learning
  • Data visualization


📌 What would the following code segment output from a dataframe df? df.head(5)

  • It would return the first 5 rows of the dataframe
  • It would return the last 5 rows of the dataframe
  • It would return all of the rows of the dataframe


📌 What does the following code segment perform in a dataframe?

mean = df["normalized-losses"].mean() df["normalized-losses"].replace(np.nan, mean)
  • It replaces the missing values in the column “normalized-losses” with the mean of that column

  • It drops rows that contain missing values

  • It drops all of the rows in the column “normalized-losses”


📌 How would you multiply each element in the column df[“c”] by 5 and assign it back to the column df[“c”]?

  • 5 * df[“b”]
  • df[“c”] = 5 * df[“c”]
  • df[“a”] = df[“c”] * 5


📌 What function returns the maximum of the values requested for the requested column?

  • max()
  • std()
  • min()


📌 Since most statistical models cannot take objects or strings as inputs, what action needs to be performed?

  • Convert numerical values into categorical variables
  • Convert categorical variables into numerical values


📌 What function will change the name of a column in a dataframe?

  • rename()
  • replace()
  • exchange()
Thank You for Visiting My Blog, Have a Good Day 😆
© 2021 Bae Kim, Powered By Gatsby.