Deep learning for historical water data digitization
Position Description
Managing water resources for agriculture, industry, ecosystems, and human consumption requires an intricate understanding of how much water there is, where it comes from, and how it moves. Historical data of rain, streamflow, and groundwater are crucial in that they help establish a baseline of how water behaves in an area and allows researchers and water managers to track how this has changed. While the United States government has enormous archives of such historical data, these data are inaccessible to researchers because they are undigitized scanned documents (example here). This project will leverage recent advances in deep learning to restore these archival water data to the public record. Using already hand-digitized records as training and test data, we will devise a computer vision strategy to extract and organize the tabulated data into easily usable .csv files. This work will enable researchers (including us!) to use these measurements to gain new insight into water resource management in the United States.
Responsibilities
The chosen candidate will be responsible for:
Reviewing existing optical character recognition (OCR) and computer vision models online and in the literature (e.g. Tesseract)
Building a deep learning model(s) to digitize historical type-written data
Training and testing of the model(s)
Organizing and preparing data for input into the model
Qualifications
Registered, continuing UCSB undergraduate student in good academic standing. Students graduating in Spring or Summer 2024 do not qualify.
Proficiency programming in Python (Python required as programming language)
Experience with deep learning or a demonstrated ability to learn quickly and independently
Proficiency using GitHub
Interest in water resources and environmental science is a bonus, but not required
Details
The desired starting time of the position is early May 2024. This position is for multiple quarters. The candidate must be available to work part-time (~10 hours/week) during spring quarter, full-time (~30-35 hours/week) during both summer sessions, and part-time (~10 hours/week) during fall quarter. The exact hours per week and number of weeks employed is flexible depending on the student schedule and the progress of the project. The student will be paid at a rate of $20/hour. The work may be completed remotely or in person.
Spring Quarter: ~6 weeks at 10 hours/week
Summer Sessions: ~12 weeks at 30-35 hours/week
Fall Quarter: ~6 weeks at 10 hours/week
The student will be co-mentored by PhD students Annette Hilton and Anna Boser. Depending on the project success and student involvement, there will be an opportunity to attend the annual American Geophysical Union (AGU) conference in Washington, DC, December 2024 with the graduate mentors.
How to Apply
Please apply to this position through HandShake. Please direct any questions about the position or application to Annette Hilton (ahilton@ucsb.edu). The deadline to apply to the position is Friday, April 19th. In your application package, please include the following:
A 1 paragraph description of why you are interested in the project and how your previous experience and qualifications make you a good fit.
CV or resume, including GPA, relevant coursework, any relevant research experience or personal projects, and anticipated quarter and year of graduation.
Unofficial transcript