Preprocessing, EDA and Forecast of a Short Time Series   Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

Short Time Series Analysis

Trapped insects forecast based on meteorological data

This was a group project (w/ Leonardo Maria Marra) part of the Information Systems and Business Intelligence exam taken at the University of Naples Federico II

Objective

We are in the case where we want to predict the number of insects caught, or their occurrence using meteorological data together with the occurrence of the same in previous time intervals.


Data loading

The data are time series pertaining to insect catches in specific areas of the territory and the related meteorological data. The data was provided in .xlsx format and, after being appropriately transformed* is uploaded in .csv format

* The transformation is explored in the Preprocessing section.


Preprocessing

At this stage the data is properly prepared for the next steps.

The data was initially split into different files. One containing the meteorological info and another regarding the catches data, for each one of the 5 areas.

Catches Data

DateTime Total captured New captures Reviewed Event
17.07.2024 1 1 Yes /
18.07.2024 1 1 Yes /
19.07.2024 1 0 Yes /
20.07.2024 1 0 Yes /
20.07.2024 / / Yes Cleaning
... ... ... ... ...

Meteorological Data

DateTime Temperature Umidity
17.07.2024 00:00:00 20,98 73,22
17.07.2024 01:00:00 23,74 59,92
17.07.2024 02:00:00 21,48 66,22
17.07.2024 03:00:00 19,62 72,62
17.07.2024 04:00:00 18,26 77,29
... ... ...

It's important to notice the difference in sampling rate between the datasets.

The transformation mentioned above consists of:

  • Removal of irrelevant columns and/or records concerning insect capture (Cleaning, Reviewed);
  • Exclusion of redundant columns concerning weather information;
  • Calculation of daily weather data and association with catch data.

This results in datasets having the following structure:

DateTime Total captured New captures Temperature Umidity
2024-08-17 0 0 28.244167 61.428750
2024-08-18 0 0 26.890000 64.881250
2024-08-19 0 0 25.890417 64.844167
2024-08-20 1 1 21.650417 83.205417
2024-08-21 1 0 23.003750 87.472083
... ... ... ... ...

Exploratory Data Analysis

This section shows some of the plots produced to explore, understand and extract visual insights from the provided data.

Interactive plots can be visualized at the hosted dashboard.

Exploring variables relationship

This graph shows the relationship between Temperature, Humidity and the number of Catches:

It seems like the majority of Catches happen in a certain interval:

  • 25°C < Temperature < 29°C
  • 50% < Humidity < 73%

These intervals may represent the insects' preferred weather conditions.


Results

Multiple models were trained with this data, in the hosted dashboard there are plots of each model's forecast for each one of the time series.

The following table sums up the results in terms of the evaluation metric used (RMSE):

Model Cicalino 1 Cicalino 2 Cicalino Merged Imola 1 Imola 2 Imola 3 Imola Merged
Decision Tree 0.77 0.33 0.89 2.05 0.63 1.00 2.57
ARIMAX 0.00 0.33 0.32 2.05 0.00 1.00 2.76
LSTM 0.00 0.33 0.32 2.05 0.00 0.00 2.76

Links