EDA: The process of investigating, organizing and analyzing datasets and summarizing their main characteristics, often employing data wrangling and visualization methods. The six main practices of EDA are: discovering, structuring, cleaning, joining, validating, and presenting

Discovering in EDA refers to gaining an initial understanding of the dataset, including:

The EDA process

The six practices of EDA are iterative ****and non-sequential

Exploratory data analysis (EDA) is not like a cake recipe. It is not a step-by-step process you follow. Instead, the six practices of EDA are iterative and non-sequential.

Because of the varying nature of datasets, the approach to exploring that data will be different each time. That means that you will need to use your logic and experience throughout the EDA process to determine which of the six practices to utilize, how many times to apply them, and when in the process you should apply them.

Visual example

Imagine you are assigned a dataset that has only 200 rows and five columns of data about trees in a coniferous forest in Norway. You know that to complete your full analysis you’ll need more than 1,000 rows and at least two more columns. Even without much more detail than that, your entire EDA process might look something like this:

Screenshot 2026-01-23 at 6.35.21 PM.png

  1. Discovering: You check out the overall shape, size, and content of the dataset. You find it is short on data.
  2. Joining: You add more data.
  3. Validating: You perform a quick check that the new data doesn’t have mistakes or misspellings.
  4. Structuring: You structure the data in different time periods and segments to understand trends.