Skip to content

Cursor/download and prepare overhead imagery 039a #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bw4sz
Copy link
Collaborator

@bw4sz bw4sz commented Jun 17, 2025

This pull request introduces the AutoArborist dataset, which automates the acquisition and annotation of overhead imagery for tree locations. It includes a dataset summary, annotations for Calgary, and a sample of tree locations. The changes establish a structured format for tree annotation and provide initial data for Calgary.

Dataset Introduction:

  • Added AutoArborist_summary.md to provide an overview of the AutoArborist dataset, including processed cities, dataset format, and imagery sources. This document outlines the structure and purpose of the dataset.

Calgary Annotations:

  • Created calgary_annotations.csv to include tree annotations for Calgary with details such as image paths, pixel coordinates, genus, taxonomy ID, and geometry.
  • Added chips/calgary_imagery.csv with similar Calgary tree annotations, formatted for chip-based imagery processing.

Tree Locations Sample:

  • Added CalgaryTrees_sample.csv containing a sample of tree locations in Calgary with longitude, latitude, genus, and taxonomy ID for reference.

@bw4sz bw4sz requested a review from Copilot June 17, 2025 17:11
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the AutoArborist dataset for automated overhead imagery acquisition and tree annotations, along with a summary of processed data for Calgary.

  • Added dataset summary documentation in AutoArborist_summary.md
  • Introduced Calgary tree annotations in two CSV files (calgary_annotations.csv and chips/calgary_imagery.csv)
  • Provided a sample of tree locations in CalgaryTrees_sample.csv

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

File Description
data_prep/tree_locations/CalgaryTrees_sample.csv New sample CSV file containing tree location data.
data_prep/AutoArborist/chips/calgary_imagery.csv CSV file with chip-based imagery annotations for trees in Calgary.
data_prep/AutoArborist/calgary_annotations.csv CSV file with tree annotations for Calgary imagery.
data_prep/AutoArborist/AutoArborist_summary.md Markdown summary outlining dataset structure, statistics, and sources.
Comments suppressed due to low confidence (2)

data_prep/AutoArborist/AutoArborist_summary.md:15

  • The 'Failed Cities' section is currently empty. Consider adding a note (e.g., 'None') or removing the section to avoid confusion.
## Failed Cities

data_prep/AutoArborist/chips/calgary_imagery.csv:3

  • Since the same image filename appears with different coordinate entries, consider clarifying in the documentation that multiple tree annotations per image are expected.
calgary_imagery_1.png,364.48123331647326,1636.4895913186558,Tree,AutoArborist,1.0,prunus,235,POINT (364.48123331647326 1636.4895913186558)

@bw4sz bw4sz requested a review from jveitchmichaelis June 17, 2025 17:18
@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 17, 2025

My goal here is to continue to explore background agents and what is possible. I was able to generate the backbone of this work, but was nervous to run a full example inside the container.

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 19, 2025

image

I think this PR is ready, it proposes a workflow that is successful in matching airborne imagery and ground data. The next steps in followup PRs will be

  1. Increase the number of data sources.
  2. Provide some kind of flag, or overall scheme to help users select which datasets they want to use. Clearly there will be issues with data quality when talking about millions of ground truth points. We want to provide functionality to help navigate that.

@bw4sz bw4sz force-pushed the cursor/download-and-prepare-overhead-imagery-039a branch from 9fb817e to 0ef7df6 Compare June 19, 2025 20:57
@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 24, 2025

I think I've done as much as I want here. Once I fix tests on main, rebase, this is ready to merge.

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 24, 2025

>>> combined_df.city.value_counts()
city
calgary            260256
edmonton           244754
new_york           192804
seattle             62267
columbus            32268
montreal             7105
charlottesville      4020
Name: count, dtype: int64
>>> combined_df.shape
(803474, 10)

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 27, 2025

As part of this PR, I have updated the tests idea to separate out the download url links from testing the release version. This makes the testing suite more logical. We still need to test the releases on hipergator before each pypi push.

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 27, 2025

These errors have to do with missing images in the docs for datasets that are unrelated. I am reprocessing, but @jveitchmichaelis , I think this can be merged and we are moving closer to stable. On the dataset front we have one more PR, and then we have to address #43 .

@jveitchmichaelis
Copy link
Contributor

jveitchmichaelis commented Jun 28, 2025

Sure, do you want to keep this summary: data_prep/AutoArborist/AutoArborist_summary.md? (shows only 4 trees)

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 28, 2025

Sure, do you want to keep this summary: data_prep/AutoArborist/AutoArborist_summary.md? (shows only 4 trees)

no I will delete it.

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jun 30, 2025

@jveitchmichaelis can you merge this, the two errors are

/home/runner/work/MillionTrees/MillionTrees/docs/datasets.md:222: WARNING: image file not readable: public/Li_et_al._2023.png [image.not_readable]
/home/runner/work/MillionTrees/MillionTrees/docs/datasets.md:266: WARNING: image file not readable: public/Takeshige_et_al._2025.png [image.not_readable]

unrelated to auto-arborist.

My aim is by the end of the week we have a more stable CI and then we can be more rigorous with PRs. Then we deal with image filtering.

Copy link
Contributor

@jveitchmichaelis jveitchmichaelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge once the summary and related annotation files are removed

@jveitchmichaelis jveitchmichaelis dismissed their stale review June 30, 2025 16:26

Removed files

@jveitchmichaelis jveitchmichaelis merged commit b67fb51 into main Jun 30, 2025
0 of 2 checks passed
@jveitchmichaelis jveitchmichaelis deleted the cursor/download-and-prepare-overhead-imagery-039a branch June 30, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants