Skip to main content

Spatio-Temporal Change Point Characterization in Tropical Forests using a Hybrid Pipeline

ABSTRACT

The Amazon Forest sequesters vast amounts of carbon, making it one of the planet’s largest terrestrial carbon sinks, unfortunately, illegal deforestation and climate-related deterioration threaten this critical ecosystem. Current historical analysis of forests through satellite imagery often results in misclassification due to seasonal variations, cloud cover, and sensor anomalies. This study proposes a Novel hybrid pipeline for break point characterization using a Break for Additive Seasons and Trends (BFAST) time series algorithm combined with a Convolutional Neural Network (CNN) and Extreme Gradient Boosting (XGB) architecture trained on a sophisticated synthetic dataset leveraging Deep Learning techniques. This system gains an accuracy of ~97% in characterizing changes into 5 major categories, demonstrating the potential as proof-of-concept of such a system in forest monitoring.

INTRODUCTION.

Tropical rainforests are crucial in controlling carbon emission levels and climate regulation. Tropical rainforests sustain the lives of millions of tribals, yet these forests are constantly under threat due to illegal deforestation and late or limited actions taken to resolve them. Historical system to classify such changes or monitor them fall short on a variety of areas. Conventional approaches rely on indices such as the Normalized Difference Vegetation Index [1] (NDVI) and utilize algorithms like Breaks for Additive Seasonal and Trend [2] (BFAST) to identify structural shifts in time series. While powerful, these algorithms often fall short in such regions due to cloud cover, complex seasonality, and/or spectral changes due to degradation or regrowth. Recent advancements in algorithms like the Convolutional Neural Network (CNN) seem apt in categorizing spatial and spectral complexity [3]. However, such systems require massive amounts of labelled data for accurate predictions; such datasets are rarely available to researchers, enterprises, or government agencies

This work proposes a hybrid pipeline for Novel Break Point Characterization in such tropical ecosystems with the Amazons as a case study as current systems for this study area fall short in many ways [4]. This system enhances the traditional BFAST with modifications such as Huber regression [5], Savitzky–Golay filtering [6] for noise reduction while preserving signal shape, a Multi index cross validation across a variety of indices, etc. A CNN with general spectral and spatial embeddings was used. The features extracted by the CNN were combined with the advanced BFAST output to be classified by an Extreme Gradient Boosting (XGB) [7] model for multi class detection.

MATERIALS AND METHODS.

Data acquisition and preprocessing.

To rigorously evaluate the model before using large scale satellite archives and due to the limited availability of labelled data, a synthetic dataset was constructed [8] which allowed for reproducibility, systematic stress testing, and the integration of disturbance scenarios without reliance on noisy raw imagery.

Spectral Trajectories were simulated for individual pixels, the baseline was constructed using sinusoidal seasonal curves with realistic tropical forest periodicity and random noise was interjected to replicate sensor artifacts, cloud contamination, and natural variability. The classes were modelled as Major Deforestation (Abrupt changes/Permanent decline in NDVI/NBR), Moderate deforestation (partial Canopy loss), Minor degradation, Regrowth/Recovery (shows a Gradual increase following a disturbance) and stable forest. Stable (Consistent seasonal) noise and random outliers were added to replicate any cloud shadowing, atmospheric interference, etc. These are critical for the testing of the formulation of the Huber regression and Savitzky-Golay Smoothing [9].

The BFAST pipeline extracted 25+ temporal descriptors per pixel which included break magnitude recovery duration, residual variance, and cross index correlation. This provided a feature space 180% richer than baseline BFAST.

BFAST time series decomposition.

A classical Breaks for Additive Seasonal and Trend (BFAST) was modified to address challenges specific to tropical forests. A Huber Regression system was adopted to reduce sensitivity to outliers from cloud cover and sensor noise. Savitzky-Golay smoothing was adopted to smooth the residuals while still preserving the breakpoints. Density based outliers were treated with distinct multipliers for dense, medium, and sparse forests to improve the classification of sensitivity. Change points were only confirmed if they were detected across a variety of indices, including NDVI, NBR and NDMI as each index has different ecological parameters, so combining them would lead to lower misclassification and capture a broader range of real changes [10]. 28 temporal descriptors were extracted which included recovery time, persistence duration, etc. This system allows the pipeline to identify genuine breakpoints while also filtering out noise. It allows the model to capture not just the presence of some form of change but also its persistence, producing descriptors, recovery trajectories, and the severity of canopy distribution. This allows the BFAST to act as a temporal backbone on the system.

Spatial and spectral feature extraction using CNN embeddings.

Local spectral and spatial context was captured by a custom trained CNN to work with the temporal signals. The CNN embeddings serve as high-dimensional descriptors.

The CNN was trained on a synthetic dataset generated from the BFAST with each sample encoding complex vegetation dynamic derived from the simulated NDVI, NBR, NDMI, and EVI signals. This setup provided precise supervision across all the change classes. The CNN embeddings learn features which are tied to well characterized disturbance patterns as the synthetic generator defines both the timing and type of change. The CNN was implemented as a series of sequential convolutional and pooling layers, moreover the architecture was made to be lightweight which prevents overfitting. The final layer produced a dense feature vector which was approximately 512 dimensions in this prototype which encoded abstract attributes. These were later concatenated with the temporal descriptors derived from the BFAST and ancillary spatial features. Seasonal cycles, stochastic noise, and random disturbances were injected into the simulated dataset itself, thus forcing the CNN to generalize across a wide range of unfavorable and favorable conditions. Reliability here was achieved directly through systematic variability embedded into the synthetic generator itself. Figure 1 illustrates the structure of the flow of the Convolutional Neural Network (CNN) embeddings used in this work.

Figure 1. For each pixel, the NDVI time series statistics and spatial coordinates were mapped to a simulated feature vector which served as mock embeddings.

Extreme Gradient Boosting (XGB).

The feature vectors from the BFAST and CNN were concatenated and later classified by the XGB. The input feature vector consisted of 20+ time-based descriptors taken from the BFAST pipeline which includes seasonal amplitudes, recovery duration, inter index correlation, and break persistence. The synthetic dataset was generated using canopy disturbances into a Landsat like time series. The model’s hyper parameters were later tuned to balance generalization and sensitivity. Bias-variance tradeoff was controlled through learning rate tuning and stochastic subsampling so that the model generalized effectively, randomness was introduced to reduce the correlations between the individual trees in the model, addressed imbalances between change and no change. A cross-index voting system was enforced to consider a joint index response. For example: A true response was only considered if the NDVI decline was consistent with NBR loss and NDMI moisture shifts. This reduced false positives by ~25%

The final output was a pixel-wise probability map of the forests to change, with confidence intervals derived from variance. By providing pixel wise probability maps with confidence intervals, the system could not only identified disturbance but also quantify uncertainty

Another enhancement was to use probability calibration using sigmoid mapping and aligning predicting class probabilities with true likelihoods. The XGB model also provided feature importance ranking which served as an interpretive tool. Figure 2 shows the representation of the model, illustrating its architecture as a modular flowchart starting from the input synthetic data and the output values.

Figure 2. The simulated multi-Index time series was processed by the enhanced BFAST, concatenates them with simulated CNN embeddings and ancillary spatial features. All features were then classified by the Extreme Gradient Boosting mechanism to predict forest change classes.

RESULTS.

Overall performance.

The AI pipeline went through training with an accuracy being 96.7%. The weighted F-1 score of 0.967. Probability Calibration was also applied post-training to ensure that predicted probabilities reflect the true likelihoods. The classification report is shown in Table 1.

Table 1.    Displays the value of metrics for all the 5 classes
Metrics Major Deforestation Moderate Deforestation Minor Degradation Regrowth Stable forest
F-1 score 0.94 0.91 0.97 1.00 1.00
Recall 0.92 0.92 0.97 1.00 1.00
Precision 0.95 0.90 0.97 1.00 1.00

 Macro-averaged scores indicate a very balanced performance across all classes. A change rate of 0.3% was also detected. The high accuracy and F-1 scores were a result of applied calibration which ensures that probabilities match their true likelihood. The higher accuracy as compared to previous works highlights this model’s ability to adapt into real world operations. Moreover, the reliability of the performance metrics shows a relatively low variance between precision and recall across all 5 classes.

Per class F-1 scores and confusion matrix.

The per-class F-1 scores show the strengths and limitations of the model. Regrowth/Recovery and Stable Forest achieved a perfect score of 1.00, highlighting the model’s ability to distinguish between no change and regeneration cases. Major Deforestation and Moderate Deforestation F-1 scores of 0.93 and 0.91 show a strong performance yet indicates that most confusion occurs between these 2 categories. This seems to align with ecological reality with such events sharing overlapping temporal and spatial signatures. Minor Degradation performs exceptionally well with an F-1 score of 0.96, which shows disturbances as canopy thinning. Fig. 3, shows that the confusion matrix aligns with the description of values and f-1 scores presented above. Demonstrating that this pipeline is not only proficient in forest change classification but is also ecologically consistent.  Moreover, the consistency of the F-1 scores shows the robustness of the pipeline. Yet further iterations could incorporate multi sensor fusion to better separate partial and complete canopy removal.

Figure 3. Most errors occurred between Major and Moderate deforestation (36 misclassifications), which reflects their spatio-temporal similarity. Minor degradation was also sometimes confused with Moderate deforestation though with a lower frequency

The high F-1 scores for regrowth and stable forest classes demonstrate the model’s ability, which can be interpreted as being highly ecologically significant. Moreover, the per-class results show that no single class dominates the others in predictions. The near-equal strength across all the classes shows that the model is not overfitting to the majority(stable) classes.

DISCUSSION.

Contribution of this study.

The novel system devised in this study demonstrates the possible potential viability of such a hybrid architecture in forest change classification. The carefully designed synthetic dataset enables controlled evaluation of the algorithm under diverse conditions where ground-truth might be scarce. Moreover, the enhanced BFAST shows the ability to capture persistence, recovery, breaks, etc., greater than traditional trend/seasonal breaks. Moreover, higher achieved accuracy and f-1 scores as compared to more traditional monitoring systems which typically operate at a range of 70-85%.

Limitations and future work.

The novel contribution of this model provides us with a proof-of-concept architecture for advanced classification of forest changes that may show further potential in the future in regard to real world use, yet for complete adoptions, classification of some classes must be improved as indicated by their F-1 scores. These classes include Major and Moderate Deforestation, which must be improved to gain a slightly higher F-1 score and accuracy. Later research could include integration of real-world labelled data using tools like Google Earth Engine or Landsat to validate the model’s performance using real-world data, though creating such a brief dataset is both time consuming and intensive. Moreover, Additional research and improvements can be added to increase model effectiveness. Furthermore, additional optimizations, such as multi-sensor fusion, could improve separation of partial and complete canopy loss and enable more change classes.

SUPPORTING INFORMATION.

The pipeline was implemented in google Colab due to local hardware limitations. The full code is available at https://github.com/98subharun/Spatio-Temporal-Change-Point-Characterization-in-Tropical-Forests

REFERENCES

  1. J. Tucker, Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 8, 127–150 (1979).
  2. Verbesselt et al., Detecting trends and seasonal changes in satellite image time series. Remote Sens. Environ. 114, 106–115 (2010).
  3. M. Jelas et al., Deforestation detection using deep learning-based semantic segmentation techniques: a systematic review. Front. For. Glob. Change 7, 1379482 (2024).
  4. Mullan et al., Estimating the value of near-real-time satellite information for monitoring deforestation in the Brazilian Amazon (Resources for the Future Working Paper, 2022).
  5. J. Huber, Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964).
  6. Savitzky, M. J. E. Golay, Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627–1639 (1964).
  7. Chen, C. Guestrin, XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (2016).
  8. Grondin et al., Training deep learning algorithms on synthetic forest images for tree detection. arXiv:2210.04104 [cs.CV] (2022).
  9. Liu et al., A method for reconstructing NDVI time-series based on envelope detection and the Savitzky-Golay filter. Int. J. Digit. Earth 15, 553–581 (2022).
  10. Zhou et al., Integration of Landsat time-series vegetation indices for abrupt and gradual vegetation change detection. Int. J. Digit. Earth 16, 1459–1481 (2023).

 



Posted by on Tuesday, May 19, 2026 in May 2026.

Tags: , , , ,