Data Cleansing using R | Fresco Play

Data Cleansing using R | Fresco Play

Monday, May 22, 2023
~ 4 min read
Data Cleansing using R | Fresco Play

Question 1: Ignoring missing values from your dataset is an easier and correct approach than updating the dataset with mean / median values

Answer: May be correct only at times when the records have more than 30-40% of the data going missing

Question 2: Data munging is

Answer: A Process to clean messy data

Question 3: Can a technically correct dataset still be incorrect for data analysis?

Answer: Yes, technically correct dataset does not mean data is clean for analysis

Question 4: Binning is a method to manage data

Answer: Noisy data

Question 5: Data cleaning is the most time consuming process in data analysis

Answer: True

Question 6: tail() function shows ___ by default

Answer: 6 rows

Question 7: print() is the recommended function to view the dataset

Answer: No, not advisable especially when the dataset is large

Question 8: ____ can be used to view data distribution of a single variable AND ____ can be used to view relation between 2 variables

Answer: hist(), plot()

Question 9: Consider cars built-in R dataset and find out what is the median of dist variable

Answer: 36

Question 10: Using head function, identify the 8th row of mtcars built-in dataset

Answer: 10 26

Question 11: Identify the function which is part of dplyr package that helps in previewing the data.

Answer: glimpse()

Question 12: In a tidy data set ___ forms a row and ____ forms a column

Answer: Observation, Variable

Question 13: A dataset with columns (country, disease, #ofdeaths) has values Row1 - (CONGO, TB, 28) Row2 - (SPAIN, TB, 2) Row3 - (EGYPT, TB, 0). Is this is a tidy or messy dataset.?

Answer: Tidy Data

Question 14: filter() is for selecting columns and select() is for selecting rows

Answer: False

Question 15: ___ allows to make new variables

Answer: mutate()

Question 16: Which function(s) of dplyr would you use to first subset the columns and then sort them on a particular column?

Answer: filter(), arrange()

Question 17: What is the class of and sys.time()

Answer: POSIXct

Question 18: Can a variable of factor type be converted to a date type

Answer: No

Question 19: If value of time is system time which is 2016-12-21 18:33:31 UTC. What is the output for time+60

Answer: "2016-12-21 18:34:31 UTC"

Question 20: What are the possible outlier treatment

Answer: All the options mentioned

Question 21: Identify the correct ones

Answer: separate() makes wide data longer

Question 22: ____ is similar to separate() function

Answer: extract()

Question 23: Which one is NOT a special value in R

Answer: None of the options

Question 24: ____ can be used to identify the existence of a matching pattern in a string

Answer: str_detect()

Question 25: While dealing with missing values in vector x, _____ and _____ results in the same output

Answer: x[!], na.omit(x)

Question 26: In R, what is the result for 0/0

Answer: 1. 0 2. Null

Question 27: Function that is part of tidyr package are

Answer: separate()

Question 28: Identify a complimentary package to tidyr

Answer: dplyr

Post a comment


Join the conversation and share your thoughts! Leave the first comment.

Get your FREE PDF on "100 Ways to Try ChatGPT Today"

Generating link, please wait for: 60 seconds

Checkout all hot deals now 🔥

Search blogs

No blog posts found