Oral Theme 2: Data science tools and techniques

NERC Digital Gathering 23

Main Auditorium, Monday 10th July 2023

Chaired by Dr. Andrew Kingdon                    To see the abstract for the talk, click the small ‘+’ button to the right.

15:00

Combining cameras with deep learning to improve flood management strategies

Dr Remy Vandaele
University of Reading
CDE Expert Network member
Vandaele, Remy (1), Dance, Sarah L (1, 2, 3), Ojha, Varun (4)

1 – Department of Meteorology, University of Reading
2 – Department of Mathematics and Statistics, University of Reading
3 – National Centre for Earth Observation, University of Reading
4 – Department of Computer Science, University of Newcastle

In recent years, advances in computer vision and deep learning techniques have opened up new possibilities for utilizing cameras to support flood management related tasks. This presentation provides an overview of our work related to two such applications:

(1) River-level monitoring. Monitoring river water-levels is essential for the study of floods and mitigating their risks. Our work proposes a method that automatically estimates calibrated water-level indexes from images. We built a dataset of 32,715 images cross-referenced with data from river water-level measurement gauge and used it to train deep convolutional neural networks. This work allows river cameras to be used for automated river water-level monitoring using calibrated indexes.

(2) Trash screen blockage detection. Trash screens are an efficient tool to prevent debris and people from entering critical parts of the river networks. However, debris can pile up at the screen and generate floods, which makes their monitoring and maintenance critical. Our work investigates the use of CCTV cameras processed by deep learning models to automatically monitor trash screen blockage. We explore several trash screen monitoring scenarios, and investigate the suitability of deep learning approaches to automatically label new trash screen images. In particular, we show that the use of a siamese convolutional neural network reaches an accuracy of 96.87% with only 5 labeled images from the new camera without needing retraining for a given camera.

In conclusion, this presentation showcases our research on the potential of deep learning techniques in managing floods through cameras. The findings of these studies contribute to the advancement of flood management strategies and can significantly enhance the resilience of communities facing flood risks.

Remy Vandaele
Dr Remy Vandaele

15:20

Counting sea pens from ocean floor video footage

Dr Meghna Asthana
The Alan Turing Institute
Asthana, M (1), R.E. Blackwell (1, 2), A. L. Downie (2), J.S. Hosking (1, 2, 3)

1 – The Alan Turing Institute
2 – Center for Environment, Fisheries and Aquaculture Science
3 – British Antarctic Survey

Pennatulacean octocorals (sea pens) are marine coelenterates forming a feather-shaped colony with a horny or calcareous skeleton. These are not only an indicator of the health of muddy ecosystems but are also considered an important habitat-forming organisms for many megafaunal organisms. The Centre for Environment, Fisheries and Aquaculture Science boasts a large collection of video footage of the sea floor, collected over the period 2014-2021, which has been used to manually quantify sea pens by trained scientists. However, the task at hand is become challenging every year due to the evolution of the equipment over time. This has led to a massive variability in the levels of visibility, quality and lighting conditions and thus, there is a need for implementing homogeneity. Footage standardisation is a key challenge as it allows for sustainable automated monitoring of ocean health.

The objective of this challenge was to investigate the potential of computer vision and deep learning to undertake a large-scale, automated study of the available data. Specifically, we aimed to achieve the following tasks – first, a methodology for reliable localisation and classification of sea pens available in the footage, second, enumerating and tagging sea pens as they appear in the footage and finally, enhancing and standardizing footage from every year to correct for variable lighting, camera and other technical conditions.

We have achieved successful identification using a You Only Look Once (YOLOv5) model across all years with an average accuracy of 90 percent against the trained human annotations and classification of Pennatula and Virgularia species with 98 percent accuracy. Furthermore, we present excellent tracking and laser detection results along with enhanced video quality using image preprocessing methods like CLAHE. We aim to further utilise these state-of-the-art techniques for varied biodiversity monitoring problems.

Remy Vandaele
Dr Meghna Asthana

15:40

Multimodal Data Analysis For Monitoring Invasive Aquatic Weeds In India

Dr Deepayan Bhowmik
Newcastle University
CDE Expert Network member
Savitri Maharaj (1), Deepayan Bhowmik (2), Armando Marino (1), Srikanth Rupavatharam (3), G Nagendra Prabhu (4), Adam Kleczkowski (5), J Alice RP Sujeetha (6), Vahid Akbari (1) and Aviraj Datta (3)
1 – University of Stirling
2 – Newcastle University
3 – International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), India
4 – Sanatana Dharma College, University of Kerala, India
5 – University of Strathclyde
6 – National Institute of Plant Health Management (NIPHM), India

The aim of this project is to develop effective methods that combine the use of multiple data sources (satellite and drone observations and ground-based sensors) to monitor the spread of the world’s most invasive aquatic weed water hyacinth (Eichhornia crassipes) in the neglected and inaccessible water bodies in India.

Invasive aquatic weeds are a serious problem affecting many parts of Asia and Africa. They cause severe degradation of the aquatic environment, with damaging impacts on fisheries, drinking water sources, agricultural irrigation, rice cultivation, navigation and recreational use of water bodies. Lakes become breeding grounds for mosquitoes, carrying diseases such as Chikungunya that endanger human health. The effects are felt severely in many parts of India, where agriculture and the tourism industry depend heavily on the local lakes and rivers, almost all of which are heavily infested with water hyacinth (WH).
Attempts to control the weed usually involve manual or mechanical removal. However, it is impossible to remove all traces of the weed completely, and it regenerates from left-behind seeds and fragments. Regrowth in small, neglected and inaccessible side streams and pools remains undetected until the spread is extensive and has reached economically important water bodies, which become re-infected and damaged. Early detection of regrowth has the potential to cut the cost of control by allowing the weed to be removed before it has reached a damaging level.

We investigate technological methods for early detection of water hyacinth regrowth, combining data from satellite, drone, and ground-level sensors. Multi-modal data has the benefit of providing both extensive area coverage and sufficient resolution to survey small water bodies. We devise algorithms for combining multi-modal data for effective detection, carry out pilot trials of our methods, and ensure the sustainability of results through training and dissemination activities.

Dr Deepayan Bhowmik

Refreshment Break 16:00 – 16:20


16:20

Scivision and EDS book: making computer vision and data science more accessible for Environmental scientists

Dr Alejandro Coca Castro
Research Fellow, Alan Turing Institute
CDE Expert Network member – Early Career Expert
Scivision authors: Coca Castro, A. (1), Conner, A. (1), Corcoran, E. (1), Costa Gomes, B., Fenton, I. (1), Famili, M. (1), Mehonic, A. (1), Strickson, O. (1), Van Zeeland, L. (1), Anhert, S. (1, 2), Lowe, A. (1, 3), Hosking, J. S (1, 4)
EDS book authors: Coca Castro, A. (1), Hosking, J. S (1, 4), EDS book community (3)
Affiliations (1 – The Alan Turing Institute, 2 – University College London, 3 – University of Cambridge, 4 – British Antarctic Survey, 5 – Multiple)

Supported by interdisciplinary collaborations between teams from environmental, statistics and computer sciences, the past decade has seen accelerated development of environmental data, models and pipelines. This talk will highlight how two community-driven initiatives, created and maintained by the Alan Turing Institute, make research products in environmental science more accessible and discoverable. Scivision (https://sci.vision) is an open-source software tool, an open catalogue of datasets and models, and a community of computer vision experts and users. Scivision aims to accelerate scientific computer vision by sharing and matching models and datasets through the Scivision catalogue. The models in the catalogue have a common interface and are designed to be installable and runnable by someone without a computer science background; the datasets indicate their domain of application and any tasks that they may be suitable for, so that they are discoverable by computer vision model developers. Scivision has been applied to environmental use cases to analyse image datasets across different scales and formats including tree crown detection from drone imagery, coastal vegetation edge detection from satellite imagery, automated extraction of plant phenotype data from multiple 2D views of whole plants, among others. Scivision has also been utilised in one of Turing’s data study groups, which involved identification of plankton species using computer vision.

EDS book (http://www.edsbook.org) is an online resource leveraging executable notebooks, cloud computing resources and technical implementations of the FAIR (Findable, Accessible, Interoperable and Reusable) principles to support the publication of datasets, innovative research and open-source tools in environmental science. EDS book provides practical guidelines and templates that maximise open infrastructure services to translate research outputs into curated, interactive, shareable and reproducible executable notebooks which benefit from a collaborative and transparent reviewing process. To date, the community has published multiple python-based notebooks covering a wide range of topics in environmental data science. More recently, EDS book successfully partnered with the 12th International Conference on Climate Informatics and Environmental Data Science Journal to deliver hands-on training and the underpinning framework for a reproducibility hackathon to support open-source climate research. In future work, we expect to increase contributions showcasing scalable and interoperable open-source developments in Julia and R programming languages and engage research networks interested in improving scientific software practices in environmental science.

Dr Alejandro Coca Castro

16:40

OpenGHG: A community platform for greenhouse gas data science

Dr Rachel Tunnicliffe
University of Bristol
Rachel Tunnicliffe (1)
Matthew Rigby (1)
Gareth Jones (1, 2)
Christopher Woods (2)
Affiliations
(1) – School of Chemistry, University of Bristol
(2) – Advanced Computing Research Centre, University of Bristol

There is an urgent need to understand how and why the concentrations of greenhouse gases (GHG) are changing in the atmosphere. Estimates of changes in GHG concentrations and fluxes are vital for decision makers who compile or interpret national GHG emissions inventories or wish to track progress of international climate agreements. The volume and diversity of GHG data is increasing rapidly and requires a broad range of expertise and computational models and methods to interpret. OpenGHG is a multi-platform toolkit that allows researchers to aggregate and standardise GHG measurements and model-derived products. Furthermore, it provides researchers with a set of tools that are vital for analysing and interpreting GHG observations and will be used to underpin a new prototype emissions evaluation system for the UK. In this talk, I will provide an overview of the design principles behind OpenGHG and provide case studies on how the system can be used to enable GHG data analysis. Finally, I will show how the OpenGHG platform is being used to communicate GHG data products to the public and stakeholders through web-based visualisation tools.

For further details, visit openghg.org.

Dr Rachel Tunnicliffe