::p_load(palmerpenguins,tidyverse,
pacman report, skimr, summarytools)
Let’s explore
For the dataset
Rough and quick
str(penguins) # Far better
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
$ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
$ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
$ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
$ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
$ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
$ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
$ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
$ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
glimpse(penguins)
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
report(penguins)
The data contains 344 observations of the following 8 variables:
- species: 3 levels, namely Adelie (n = 152, 44.19%), Chinstrap (n = 68,
19.77%) and Gentoo (n = 124, 36.05%)
- island: 3 levels, namely Biscoe (n = 168, 48.84%), Dream (n = 124, 36.05%)
and Torgersen (n = 52, 15.12%)
- bill_length_mm: n = 344, Mean = 43.92, SD = 5.46, Median = , MAD = 7.04,
range: [32.10, 59.60], Skewness = 0.05, Kurtosis = -0.88, 0.58% missing
- bill_depth_mm: n = 344, Mean = 17.15, SD = 1.97, Median = , MAD = 2.22,
range: [13.10, 21.50], Skewness = -0.14, Kurtosis = -0.91, 0.58% missing
- flipper_length_mm: n = 344, Mean = 200.92, SD = 14.06, Median = , MAD =
16.31, range: [172, 231], Skewness = 0.35, Kurtosis = -0.98, 0.58% missing
- body_mass_g: n = 344, Mean = 4201.75, SD = 801.95, Median = , MAD = 889.56,
range: [2700, 6300], Skewness = 0.47, Kurtosis = -0.72, 0.58% missing
- sex: 2 levels, namely female (n = 165, 47.97%), male (n = 168, 48.84%) and
missing (n = 11, 3.20%)
- year: n = 344, Mean = 2008.03, SD = 0.82, Median = 2008.00, MAD = 1.48,
range: [2007, 2009], Skewness = -0.05, Kurtosis = -1.50, 0% missing
::describe(penguins) psych
vars n mean sd median trimmed mad min max
species* 1 344 1.92 0.89 2.00 1.90 1.48 1.0 3.0
island* 2 344 1.66 0.73 2.00 1.58 1.48 1.0 3.0
bill_length_mm 3 342 43.92 5.46 44.45 43.91 7.04 32.1 59.6
bill_depth_mm 4 342 17.15 1.97 17.30 17.17 2.22 13.1 21.5
flipper_length_mm 5 342 200.92 14.06 197.00 200.34 16.31 172.0 231.0
body_mass_g 6 342 4201.75 801.95 4050.00 4154.01 889.56 2700.0 6300.0
sex* 7 333 1.50 0.50 2.00 1.51 0.00 1.0 2.0
year 8 344 2008.03 0.82 2008.00 2008.04 1.48 2007.0 2009.0
range skew kurtosis se
species* 2.0 0.16 -1.73 0.05
island* 2.0 0.61 -0.91 0.04
bill_length_mm 27.5 0.05 -0.89 0.30
bill_depth_mm 8.4 -0.14 -0.92 0.11
flipper_length_mm 59.0 0.34 -1.00 0.76
body_mass_g 3600.0 0.47 -0.74 43.36
sex* 1.0 -0.02 -2.01 0.03
year 2.0 -0.05 -1.51 0.04
Neat and quick
skim(penguins)
Name | penguins |
Number of rows | 344 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
species | 0 | 1.00 | FALSE | 3 | Ade: 152, Gen: 124, Chi: 68 |
island | 0 | 1.00 | FALSE | 3 | Bis: 168, Dre: 124, Tor: 52 |
sex | 11 | 0.97 | FALSE | 2 | mal: 168, fem: 165 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
bill_length_mm | 2 | 0.99 | 43.92 | 5.46 | 32.1 | 39.23 | 44.45 | 48.5 | 59.6 | ▃▇▇▆▁ |
bill_depth_mm | 2 | 0.99 | 17.15 | 1.97 | 13.1 | 15.60 | 17.30 | 18.7 | 21.5 | ▅▅▇▇▂ |
flipper_length_mm | 2 | 0.99 | 200.92 | 14.06 | 172.0 | 190.00 | 197.00 | 213.0 | 231.0 | ▂▇▃▅▂ |
body_mass_g | 2 | 0.99 | 4201.75 | 801.95 | 2700.0 | 3550.00 | 4050.00 | 4750.0 | 6300.0 | ▃▇▆▃▂ |
year | 0 | 1.00 | 2008.03 | 0.82 | 2007.0 | 2007.00 | 2008.00 | 2009.0 | 2009.0 | ▇▁▇▁▇ |
|>
penguins group_by(species) |>
skim()
Name | group_by(penguins, specie… |
Number of rows | 344 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
factor | 2 |
numeric | 5 |
________________________ | |
Group variables | species |
Variable type: factor
skim_variable | species | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|---|
island | Adelie | 0 | 1.00 | FALSE | 3 | Dre: 56, Tor: 52, Bis: 44 |
island | Chinstrap | 0 | 1.00 | FALSE | 1 | Dre: 68, Bis: 0, Tor: 0 |
island | Gentoo | 0 | 1.00 | FALSE | 1 | Bis: 124, Dre: 0, Tor: 0 |
sex | Adelie | 6 | 0.96 | FALSE | 2 | fem: 73, mal: 73 |
sex | Chinstrap | 0 | 1.00 | FALSE | 2 | fem: 34, mal: 34 |
sex | Gentoo | 5 | 0.96 | FALSE | 2 | mal: 61, fem: 58 |
Variable type: numeric
skim_variable | species | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|---|
bill_length_mm | Adelie | 1 | 0.99 | 38.79 | 2.66 | 32.1 | 36.75 | 38.80 | 40.75 | 46.0 | ▁▆▇▆▁ |
bill_length_mm | Chinstrap | 0 | 1.00 | 48.83 | 3.34 | 40.9 | 46.35 | 49.55 | 51.08 | 58.0 | ▂▇▇▅▁ |
bill_length_mm | Gentoo | 1 | 0.99 | 47.50 | 3.08 | 40.9 | 45.30 | 47.30 | 49.55 | 59.6 | ▃▇▆▁▁ |
bill_depth_mm | Adelie | 1 | 0.99 | 18.35 | 1.22 | 15.5 | 17.50 | 18.40 | 19.00 | 21.5 | ▂▆▇▃▁ |
bill_depth_mm | Chinstrap | 0 | 1.00 | 18.42 | 1.14 | 16.4 | 17.50 | 18.45 | 19.40 | 20.8 | ▅▇▇▆▂ |
bill_depth_mm | Gentoo | 1 | 0.99 | 14.98 | 0.98 | 13.1 | 14.20 | 15.00 | 15.70 | 17.3 | ▅▇▇▆▂ |
flipper_length_mm | Adelie | 1 | 0.99 | 189.95 | 6.54 | 172.0 | 186.00 | 190.00 | 195.00 | 210.0 | ▁▆▇▅▁ |
flipper_length_mm | Chinstrap | 0 | 1.00 | 195.82 | 7.13 | 178.0 | 191.00 | 196.00 | 201.00 | 212.0 | ▁▅▇▅▂ |
flipper_length_mm | Gentoo | 1 | 0.99 | 217.19 | 6.48 | 203.0 | 212.00 | 216.00 | 221.00 | 231.0 | ▂▇▇▆▃ |
body_mass_g | Adelie | 1 | 0.99 | 3700.66 | 458.57 | 2850.0 | 3350.00 | 3700.00 | 4000.00 | 4775.0 | ▅▇▇▃▂ |
body_mass_g | Chinstrap | 0 | 1.00 | 3733.09 | 384.34 | 2700.0 | 3487.50 | 3700.00 | 3950.00 | 4800.0 | ▁▅▇▃▁ |
body_mass_g | Gentoo | 1 | 0.99 | 5076.02 | 504.12 | 3950.0 | 4700.00 | 5000.00 | 5500.00 | 6300.0 | ▃▇▇▇▂ |
year | Adelie | 0 | 1.00 | 2008.01 | 0.82 | 2007.0 | 2007.00 | 2008.00 | 2009.00 | 2009.0 | ▇▁▇▁▇ |
year | Chinstrap | 0 | 1.00 | 2007.97 | 0.86 | 2007.0 | 2007.00 | 2008.00 | 2009.00 | 2009.0 | ▇▁▆▁▇ |
year | Gentoo | 0 | 1.00 | 2008.08 | 0.79 | 2007.0 | 2007.00 | 2008.00 | 2009.00 | 2009.0 | ▆▁▇▁▇ |
dfSummary(penguins)
Data Frame Summary
penguins
Dimensions: 344 x 8
Duplicates: 0
--------------------------------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- ------------------- -------------------------- --------------------- --------------------- ---------- ---------
1 species 1. Adelie 152 (44.2%) IIIIIIII 344 0
[factor] 2. Chinstrap 68 (19.8%) III (100.0%) (0.0%)
3. Gentoo 124 (36.0%) IIIIIII
2 island 1. Biscoe 168 (48.8%) IIIIIIIII 344 0
[factor] 2. Dream 124 (36.0%) IIIIIII (100.0%) (0.0%)
3. Torgersen 52 (15.1%) III
3 bill_length_mm Mean (sd) : 43.9 (5.5) 164 distinct values . . : 342 2
[numeric] min < med < max: . : : : : : (99.4%) (0.6%)
32.1 < 44.5 < 59.6 : : : : : :
IQR (CV) : 9.3 (0.1) : : : : : : .
: : : : : : : : .
4 bill_depth_mm Mean (sd) : 17.2 (2) 80 distinct values : 342 2
[numeric] min < med < max: : : (99.4%) (0.6%)
13.1 < 17.3 < 21.5 : . : : : .
IQR (CV) : 3.1 (0.1) . : : : : : :
: : : : : : : . .
5 flipper_length_mm Mean (sd) : 200.9 (14.1) 55 distinct values : 342 2
[integer] min < med < max: . : (99.4%) (0.6%)
172 < 197 < 231 : : : . .
IQR (CV) : 23 (0.1) . : : : : : :
: : : : : : : : :
6 body_mass_g Mean (sd) : 4201.8 (802) 94 distinct values : 342 2
[integer] min < med < max: . : (99.4%) (0.6%)
2700 < 4050 < 6300 : : : :
IQR (CV) : 1200 (0.2) : : : : : .
. : : : : : :
7 sex 1. female 165 (49.5%) IIIIIIIII 333 11
[factor] 2. male 168 (50.5%) IIIIIIIIII (96.8%) (3.2%)
8 year Mean (sd) : 2008 (0.8) 2007 : 110 (32.0%) IIIIII 344 0
[integer] min < med < max: 2008 : 114 (33.1%) IIIIII (100.0%) (0.0%)
2007 < 2008 < 2009 2009 : 120 (34.9%) IIIIII
IQR (CV) : 2 (0)
--------------------------------------------------------------------------------------------------------------------
#view(dfSummary(penguins))
For individual variables
Continuous variables
boxplots histograms/density plots
sHINY APP - Interactive
https://jgassen.shinyapps.io/expand/ # but I am unable to use this
library(ExPanDaR)
Warning: package 'ExPanDaR' was built under R version 4.2.3
#ExPanD(penguins)