Drone-based AI system detects agricultural fires faster, smarter and more accurately

The core innovation lies in replacing the traditional DETR backbone with ConvNeXt, a convolutional neural network inspired by the design principles of vision transformers. This backbone is tailored to extract complex spatial and contextual features across multiple hierarchical stages. The researchers removed ConvNeXt’s classification head and introduced a Feature Enhancement Block (FEB), a module that applies layered convolution, normalization, and activation to refine visual representations before object detection is executed by the transformer module


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 23-05-2025 23:09 IST | Created: 23-05-2025 23:09 IST
Drone-based AI system detects agricultural fires faster, smarter and more accurately
Representative Image. Credit: ChatGPT

A newly published study has unveiled a high-performance artificial intelligence model that significantly enhances the early detection of agricultural fires using a novel architecture that merges transformers with advanced convolutional backbones. The study, titled “AI-Driven Boost in Detection Accuracy for Agricultural Fire Monitoring” and published in Fire, presents an optimized version of the Detection Transformer (DETR) that integrates a ConvNeXt backbone and a Feature Enhancement Block (FEB) to elevate the accuracy and responsiveness of fire detection systems in rural landscapes.

The research responds to the growing threat of fires in agricultural areas, driven by both natural and anthropogenic causes, that continue to disrupt food security, damage infrastructure, and harm ecosystems. Current fire detection solutions often fail under variable weather and lighting conditions or when fires are in their incipient stages. This new system offers a leap forward in precision, recall, and responsiveness, outperforming leading models such as YOLOv9s and baseline DETR frameworks.

How does the new AI system improve fire detection in agriculture?

The core innovation lies in replacing the traditional DETR backbone with ConvNeXt, a convolutional neural network inspired by the design principles of vision transformers. This backbone is tailored to extract complex spatial and contextual features across multiple hierarchical stages. The researchers removed ConvNeXt’s classification head and introduced a Feature Enhancement Block (FEB), a module that applies layered convolution, normalization, and activation to refine visual representations before object detection is executed by the transformer module.

The model processes RGB images from UAV-based surveillance and converts them into visual embeddings. These are then used by the DETR’s encoder-decoder transformer structure to detect fire or smoke using classification and bounding box regression. The pipeline eliminates the need for anchor boxes or post-processing heuristics like non-maximum suppression, streamlining the detection task while improving accuracy on small-scale, early-stage fire events.

A rigorous ablation study revealed that each architectural enhancement, the ConvNeXt backbone and the FEB, contributed incrementally to performance. The final model achieved a precision of 89.67%, recall of 86.74%, mean Average Precision (mAP) of 85.13%, and an F1-score of 92.43%. These results were significantly higher than those of standard DETR (precision: 86.6%, mAP: 79.66%) and even the highest-performing YOLO variant (YOLOv9s, F1-score: 90.1%).

What was the dataset and testing environment behind these results?

The research team constructed a custom dataset of 8410 image frames sourced from 38 aerial video clips, including UAV surveillance footage and publicly available videos. Of these, 5763 images were manually annotated with bounding boxes indicating fire or smoke. To maintain consistency, all images were resized to 224x224 pixels. The dataset covered a wide range of environmental conditions, daylight, dusk, and overcast, and varied in fire visibility, including smoldering, flame-only, and smoke-only instances.

Augmentation techniques such as rotation, flipping, and color filtering enhanced the training data’s diversity, ensuring the model could generalize across different fire scenarios. Testing was performed using a system equipped with an NVIDIA RTX 3090 GPU, 128 GB RAM, and PyTorch 2.0.1 under a Linux-based platform, allowing for high-throughput experimentation and optimization.

The researchers benchmarked the proposed model against five major YOLO architectures (YOLOv5s through YOLOv9s) and the original DETR implementation. Performance evaluation metrics included precision (true positives vs. false positives), recall (true positives vs. missed detections), and F1-score (harmonic mean of precision and recall). On all fronts, the new architecture surpassed the competition, especially in recognizing subtle indicators of fire in early stages, which traditional systems often miss.

How can this technology be applied in real-world agricultural settings?

Beyond its academic contributions, the proposed model is designed for real-world deployment. It supports implementation on both edge devices for field-level monitoring and centralized server systems for regional surveillance. This dual applicability makes it suitable for integration into smart agriculture infrastructure, particularly in regions prone to seasonal or human-induced fires.

The system’s strength lies in its capacity to detect fires early, offering farmers and emergency services a critical window for intervention. Early detection not only reduces direct crop loss but also curtails the spread of fire to nearby ecological or urban areas. Additionally, its reliance on UAV imaging and RGB inputs ensures compatibility with widely available agricultural drones, reducing the barrier to adoption.

However, the authors acknowledge some limitations. The model’s effectiveness may degrade in extremely low-light or non-visible spectrum scenarios, such as during night-time surveillance. Future iterations of the system aim to incorporate multimodal inputs, including infrared and multispectral data, to overcome this shortfall. They also plan to enhance performance on resource-constrained devices by reducing model complexity without sacrificing detection accuracy.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback