Exploratory Data Analysis packages
EDA, or Exploratory Data Analysis can take many forms. This brief section will recommend a few packages which can be used to explore your data, more or less, automagically. The packages can be complex and may take some effort to learn. However, if you’re brand new to data mining, you may benefit from reading these package documentation pages, and then applying their functions to your data.
Recommended EDA packages
{skimr} - https://docs.ropensci.org/skimr
a frictionless approach to summary statistics{gtExtras} - https://jthomasmock.github.io/gtExtras/reference/gt_plt_summary.html
create a summary table with historgrams or area bar chatrs from a dataframe{DataExplorer} - https://boxuancui.github.io/DataExplorer/reference/plot_intro.html
Plot basic information{corrplot} - https://github.com/taiyun/corrplot
a visual exploratory tool on correlation matrix that supports automatic variable reordering{summarytools} - https://github.com/dcomtois/summarytools
or data cleaning, exploring, and simple reporting{tableone} - https://github.com/kaz-yos/tableone
create “Table 1”, description of baseline characteristics{dtracker} - https://terminological.github.io/dtrackr
Accurate documentation of a data pipeline is a first step to reproducibility, and a flow chart describing the steps taken to prepare data