Think about the past and the future of data analysis
- Everyone is a data analyst, whether they know it or not and whether they like it or not.
- Training for data analysis is essential, but there is limited bandwith.
- We can automate or teach. None of them is perfect.
- Tukey JW, The Future of Data Analysis: If data analysis is to be well done, much of it must be a matter of judgement and theory, whether statistical or non-statistical, will have to guide and not command.
- Theory comes to rescue. It provides a scalable understanding of what's good and what's not. Like in music.
- Today, there is still a lot of discussion about the "instruments" rather than the "music" that is being produced.
- We have a good idea a posteriori but not a priori.
- Basically, you need to be provided with everything – both software and data – to be able to reproduce the results. You also need to be quite knowledgable in the field to understand what's going on.
- Examples of aesthetics we could apply to data analysis:
- Reproducible: analytic code, software packages, VCS, data formatting, metadata, documentation, distribution
- Translatable: modularised analysis pipelines, quantitative programming environment, generalisable principles, reusable code practices, APIs
- Robust to New Data: statistical techniques, assertive testing, input validity, modular design, code review, fail loudly and early
- Simple to Communicate: statistical technique, medium, peer review
- What makes a good data analysis? What makes a good data analyst?
- It's basically figured out for data visualisation though. And that thanks to science and testing what works and what doesn't.
Metadata