Do we really need box plots?
If violin plots can reveal additional features like skewness, clusters and multiple peaks, is there ever any reason to use a box plot instead of a violin plot? I've been asking myself this for a while, and here's some possible answers:
- Violin plots can bury the median in a sea of density.
- Violin plots rely on kernel density estimation. With small n (less than 30), KDE becomes unstable or misleading.
- Box plots are reproducible and parameter-free.
- Violin plots scale poorly with the number of categories.
- Box plots explicitly mark outliers.
A violin plot asks what the distribution looks like, while a box plot asks where the distribution is located and how spread out it is. Those aren't really the same question.