Think about the past and the future of data analysis

  • data-analytics
  • idea
  • talk
  • winter
  • Everyone is a data analyst, whether they know it or not and whether they like it or not.
  • Training for data analysis is essential, but there is limited bandwith.
  • We can automate or teach. None of them is perfect.
  • Tukey JW, The Future of Data Analysis: If data analysis is to be well done, much of it must be a matter of judgement and theory, whether statistical or non-statistical, will have to guide and not command.
  • Theory comes to rescue. It provides a scalable understanding of what's good and what's not. Like in music.
  • Today, there is still a lot of discussion about the "instruments" rather than the "music" that is being produced.
  • We have a good idea a posteriori but not a priori.
  • Basically, you need to be provided with everything – both software and data – to be able to reproduce the results. You also need to be quite knowledgable in the field to understand what's going on.
  • Examples of aesthetics we could apply to data analysis:
    • Reproducible: analytic code, software packages, VCS, data formatting, metadata, documentation, distribution
    • Translatable: modularised analysis pipelines, quantitative programming environment, generalisable principles, reusable code practices, APIs
    • Robust to New Data: statistical techniques, assertive testing, input validity, modular design, code review, fail loudly and early
    • Simple to Communicate: statistical technique, medium, peer review
  • What makes a good data analysis? What makes a good data analyst?
  • It's basically figured out for data visualisation though. And that thanks to science and testing what works and what doesn't.
Metadata