Data Cleansing using R | Fresco Play
Question 1: Ignoring missing values from your dataset is an easier and correct approach than updating the dataset with mean / median values
Answer: May be correct only at times when the records have more than 30-40% of the data going missing
Question 2: Data munging is
Answer: A Process to clean messy data
Question 3: Can a technically correct dataset still be incorrect for data analysis?
Answer: Yes, technically correct dataset does not mean data is clean for analysis
Question 4: Binning is a method to manage data
Answer: Noisy data
Question 5: Data cleaning is the most time consuming process in data analysis
Answer: True
Question 6: tail() function shows ___ by default
Answer: 6 rows
Question 7: print() is the recommended function to view the dataset
Answer: No, not advisable especially when the dataset is large
Question 8: ____ can be used to view data distribution of a single variable AND ____ can be used to view relation between 2 variables
Answer: hist(), plot()
Question 9: Consider cars built-in R dataset and find out what is the median of dist variable
Answer: 36
Question 10: Using head function, identify the 8th row of mtcars built-in dataset
Answer: 10 26
Question 11: Identify the function which is part of dplyr package that helps in previewing the data.
Answer: glimpse()
Question 12: In a tidy data set ___ forms a row and ____ forms a column
Answer: Observation, Variable
Question 13: A dataset with columns (country, disease, #ofdeaths) has values Row1 - (CONGO, TB, 28) Row2 - (SPAIN, TB, 2) Row3 - (EGYPT, TB, 0). Is this is a tidy or messy dataset.?
Answer: Tidy Data
Question 14: filter() is for selecting columns and select() is for selecting rows
Answer: False
Question 15: ___ allows to make new variables
Answer: mutate()
Question 16: Which function(s) of dplyr would you use to first subset the columns and then sort them on a particular column?
Answer: filter(), arrange()
Question 17: What is the class of sys.date() and sys.time()
Answer: POSIXct
Question 18: Can a variable of factor type be converted to a date type
Answer: No
Question 19: If value of time is system time which is 2016-12-21 18:33:31 UTC. What is the output for time+60
Answer: "2016-12-21 18:34:31 UTC"
Question 20: What are the possible outlier treatment
Answer: All the options mentioned
Question 21: Identify the correct ones
Answer: separate() makes wide data longer
Question 22: ____ is similar to separate() function
Answer: extract()
Question 23: Which one is NOT a special value in R
Answer: None of the options
Question 24: ____ can be used to identify the existence of a matching pattern in a string
Answer: str_detect()
Question 25: While dealing with missing values in vector x, _____ and _____ results in the same output
Answer: x[!is.na(x)], na.omit(x)
Question 26: In R, what is the result for 0/0
Answer: 1. 0 2. Null
Question 27: Function that is part of tidyr package are
Answer: separate()
Question 28: Identify a complimentary package to tidyr
Answer: dplyr
Post a comment
Get your FREE PDF on "100 Ways to Try ChatGPT Today"
Generating link, please wait for: 60 seconds
Comments
Join the conversation and share your thoughts! Leave the first comment.