Data analysis methods in STEM

Glossary

Download a glossary of terms used in this course.

Data analysis involves processing, ordering, interpreting and presenting your data in a meaningful way to find patterns and derive insights from a data set. Which data analysis method to apply depends on your research topic, methodology and data collection approach, and you should familiarise yourself with conventions in your field. Evaluating and selecting the most suitable analytical approach before data collection begins will enable you to make the most of your data set and enable you to collect the right data for the question.

Be open to mixed data analysis methods, which may provide greater understanding and overcome limitations of individual approaches, and investigate any digital tools (for example, software and online resources) that may help you to analyse your data more efficiently.

Select each segment of the graphic to learn about the different data analysis methods. Then make a note of your response to the questions and continue for feedback.

Select the tabs to learn about the different data analysis methods. Then make a note of your response to the questions and continue for feedback.

Continue on to learn about different data analysis methods. Then record your response to the question and consider the feedback.

Statistical analysis

Used in hypothesis testing to estimate differences between two or more groups or to measure the relationships between two or more variables.

Cluster analysis

Uses statistical techniques to group, classify and explore data to identify or uncover groups or structures within the data.

Regression analysis

A statistical method for determining the relative impact of a predictor on some outcome.

Correlation analysis

Includes statistical tests to identify differences between the values of two or more groups.

Machine learning

Computer systems that use algorithms, statistical models and training data to identify data patterns and make predictions for future data sets.

Modelling analysis

Applies visual or mathematical models (algorithms) to data to represent a phenomenon, identify relationships among variables and make predictions.

Experimental analysis

Conducts statistical tests on data collected under controlled conditions.

Holistic data analysis

Developed in STEM from Indigenous ways of knowing and understanding the world and incorporates diverse perspectives, contexts and factors.

Text analytics

Applies statistical and linguistic methods to extract and classify unstructured text.

In silico modelling

Uses computer simulations to model a physiological process.

Do you think any of these data analysis methods might be suitable for your research? Could you combine any of these methods for a more complete understanding of your data?

There are a wide variety of data analysis methods that can be used in addition to the ones mentioned here. When thinking about suitable methods, read around the subject and speak to your supervisors. Reflect on the different approaches available to address your research questions and their pros and cons. Where appropriate, consider mixed methods.

Further information on data analysis methods in different disciplinary areas can be found in the module Data analysis methods in Data Literacy for Research Methods. More information on the practical application of a variety of analysis methods used in the social sciences can be found in the module Data analysis methods in Research Methods in Practice: Social Sciences.

Quantitative studies involving numerical data allow researchers to conduct statistical tests. The most common methods employ descriptive statistics to summarise data or inferential statistics to identify statistically significant differences between groups and uncover causal relationships. There are conventions for the presentation and interpretation of statistical data, and these aspects of the research process are often carried out using statistical software packages.

Useful information

P-hacking refers to inappropriate data analysis practices (which may be linked to conscious or unconscious bias) used so that certain patterns in data may be presented as statistically significant. These can include choice of statistical tests, parameters and when to stop data collection.

An outlier is an observation or value that appears to be inconsistent with the other data in a data set. Many statistical tests are sensitive to the presence of outliers due to effects on calculation of the mean and variance. Outliers can occur by chance or as a result of measurement or recording errors, for example. Alternatively, they may represent authentic observations. It is therefore important to investigate the nature of an outlier before deciding whether to exclude it. Note that different statistical analyses handle outliers with varying degrees of sensitivity and some account for perceived outliers better than others (e.g. distributional models, if we believe that there is more than one data generation process).

You can find a description of some common statistical tests and how to choose the appropriate statistical technique on the Emerald Publishing website.

Consider the following scenario, then make a note of your response to the question.

Researcher 1: I really liked the presentation of your sampling analysis! There was one thing I wanted to ask you – I noticed an outlier in one of your graphs. Why didn't you exclude it?

Researcher 2: I couldn't identify any data collection errors or sampling problems, so I attributed it to natural variation. But this outlier seems to be distorting my latest results. I'm wondering if I should collect more data, to achieve statistical significance, or exclude it.

Researcher 3: That is known as p-hacking and, as a rule, it's not good practice. You need to investigate the underlying cause. Did you perform a power calculation at the start to calculate the number of samples you would need?

Researcher 2: I did.

Researcher 1: Then you need to revisit your research design to be sure that nothing has gone wrong.

Later that day...

Researcher 2 thinks: Did anything unusual happen when I collected the data? Did I note anything different about that observation? Did the control samples perform as expected? Or might the outlier represent a portion of the population?

At a meeting with the supervisor...

Supervisor: Have you investigated this outlier?

Researcher 2: Yes. I could not identify any measurement errors and I didn't make any data entry mistakes.

Supervisor: That is very important to check. In that case, try running the analysis with and without the outlier, report both upfront and provide your interpretation of how it influences the results.

Question:

Reflect on your own data analysis plan. What are the issues that might arise? Prepare a list to discuss with your supervisors and peers. Holding this type of discussion is extremely important and will help you to clarify and hone your choices.

Research in STEM disciplines must stand up to rigorous questioning. In quantitative research, statistical tests allow researchers to assess validity and reliability in independently repeated trials. Statistical significance is typically expressed as a p-value, which indicates the likelihood that an observed finding is due to chance.

While qualitative studies typically involve text, image or case analysis, rigour and accuracy are still required so that the work can be evaluated. All researchers are expected to apply principles of research integrity.