Data Cleansing using R | Fresco Play

Data Cleansing using R | Fresco Play

Monday, May 22, 2023
~ 4 min read
Data Cleansing using R | Fresco Play

Question 1: Ignoring missing values from your dataset is an easier and correct approach than updating the dataset with mean / median values

Answer: May be correct only at times when the records have more than 30-40% of the data going missing


Question 2: Data munging is

Answer: A Process to clean messy data


Question 3: Can a technically correct dataset still be incorrect for data analysis?

Answer: Yes, technically correct dataset does not mean data is clean for analysis


Question 4: Binning is a method to manage data

Answer: Noisy data


Question 5: Data cleaning is the most time consuming process in data analysis

Answer: True


Question 6: tail() function shows ___ by default

Answer: 6 rows


Question 7: print() is the recommended function to view the dataset

Answer: No, not advisable especially when the dataset is large


Question 8: ____ can be used to view data distribution of a single variable AND ____ can be used to view relation between 2 variables

Answer: hist(), plot()


Question 9: Consider cars built-in R dataset and find out what is the median of dist variable

Answer: 36


Question 10: Using head function, identify the 8th row of mtcars built-in dataset

Answer: 10 26


Question 11: Identify the function which is part of dplyr package that helps in previewing the data.

Answer: glimpse()


Question 12: In a tidy data set ___ forms a row and ____ forms a column

Answer: Observation, Variable


Question 13: A dataset with columns (country, disease, #ofdeaths) has values Row1 - (CONGO, TB, 28) Row2 - (SPAIN, TB, 2) Row3 - (EGYPT, TB, 0). Is this is a tidy or messy dataset.?

Answer: Tidy Data


Question 14: filter() is for selecting columns and select() is for selecting rows

Answer: False


Question 15: ___ allows to make new variables

Answer: mutate()


Question 16: Which function(s) of dplyr would you use to first subset the columns and then sort them on a particular column?

Answer: filter(), arrange()


Question 17: What is the class of sys.date() and sys.time()

Answer: POSIXct


Question 18: Can a variable of factor type be converted to a date type

Answer: No


Question 19: If value of time is system time which is 2016-12-21 18:33:31 UTC. What is the output for time+60

Answer: "2016-12-21 18:34:31 UTC"


Question 20: What are the possible outlier treatment

Answer: All the options mentioned


Question 21: Identify the correct ones

Answer: separate() makes wide data longer


Question 22: ____ is similar to separate() function

Answer: extract()


Question 23: Which one is NOT a special value in R

Answer: None of the options


Question 24: ____ can be used to identify the existence of a matching pattern in a string

Answer: str_detect()


Question 25: While dealing with missing values in vector x, _____ and _____ results in the same output

Answer: x[!is.na(x)], na.omit(x)


Question 26: In R, what is the result for 0/0

Answer: 1. 0 2. Null


Question 27: Function that is part of tidyr package are

Answer: separate()


Question 28: Identify a complimentary package to tidyr

Answer: dplyr


Post a comment

Comments

Join the conversation and share your thoughts! Leave the first comment.

Get your FREE PDF on "100 Ways to Try ChatGPT Today"

Generating link, please wait for: 60 seconds

Checkout all hot deals now 🔥

Search blogs

No blog posts found