Pollen Prediction

This project implements a complete pipeline for airborne pollen forecasting using several time-series models. The pipeline evaluates both fully pre-trained foundational models (used directly for inference) and non-foundational models that require additional training/fine-tuning on the provided dataset.

📂 Directory Structure

datasets/ → Directory where the user can store the dataset (optional). The original dataset was removed to comply with confidentiality agreements.
notebooks/ → Contains all Jupyter notebooks used throughout the pipeline.
requirements/ → Includes several requirements files with dependencies needed for each model. They can be installed easily using pip.
src/ → Contains the ppf package, which groups Python modules with helper functions for the analysis.
outputs/ → Stores predictions, evaluations, execution times, and plots generated during the workflow.

⚙️ Setup

The main code is located in the ppf directory.
Create a virtual environment for each model.
Install a compatible Python version (tested with Python 3.10.11).
Install a compatible CUDA version (tested with CUDA 12.4).
Install a compatible PyTorch version. Example for Windows:

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

Install the dependencies for each model (Moirai, Chronos, NHits, and NBeats) from the corresponding files in requirements/:

pip install -r requirements/<model_name>.txt

Usage

Activate the virtual environment corresponding to the model you want to test.
Launch Jupyter Notebook or JupyterLab:

jupyter lab

Navigate to the notebooks/ directory and open the notebook you want to run.

Dataset

Some notebooks cannot be executed as-is because they relied on the original dataset (AlnusOurense9322.csv) or on predictions produced by models that were removed. The base code remains fully functional: you only need to add a new dataset with the same structure to run all notebooks again.

To use a different dataset:

Place the new dataset inside the datasets/ directory.
If the dataset is already in CSV format, skip this step. If not, you have two options:
1. Convert your dataset to CSV.
2. Modify the code so that your dataset is loaded and stored as a DataFrame (instructions below).
In subsection 3.1.3 Get the pollen time series of the notebooks:
- 02-01_Forecasting_with_Moirai
- 02-02_Forecasting_with_Chronos
- 02-03_Forecasting_with_NHiTs_one_year_train
- 02-04_Forecasting_with_NHiTs_five_years_train
- 02-05_Forecasting_with_NBeats_one_year_train
- 02-06_Forecasting_with_NBeats_five_years_train Replace the dataset path with the path to your file. If stored in datasets/ it should look like:

df = pd.read_csv("../datasets/<file_name>.csv")

Alternatively, you may replace this line with code that converts your dataset into a DataFrame.

The expected dataset structure includes the following columns with daily samples from 1993 to 2023: "date", "pollen", "rain", "tmax", "tmin", "tmed". You may adapt these variables in subsections 3.1.1 Constants and 3.1.2 Arguments of the notebooks listed above. You are free to modify any function parameters across the notebooks. If you need to understand how the code works, refer to the source files in src/ppf.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
notebooks		notebooks
outputs		outputs
requirements		requirements
src/ppf		src/ppf
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pollen Prediction

📂 Directory Structure

⚙️ Setup

Usage

Dataset

About

Uh oh!

Languages

License

Nico-VR/Pollen-Prediction-TFG

Folders and files

Latest commit

History

Repository files navigation

Pollen Prediction

📂 Directory Structure

⚙️ Setup

Usage

Dataset

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages