Skip to content

v.what: Update JSON output format#6252

Merged
petrasovaa merged 6 commits intoOSGeo:mainfrom
NishantBansal2003:json/v-what
Sep 2, 2025
Merged

v.what: Update JSON output format#6252
petrasovaa merged 6 commits intoOSGeo:mainfrom
NishantBansal2003:json/v-what

Conversation

@NishantBansal2003
Copy link
Contributor

@NishantBansal2003 NishantBansal2003 commented Aug 23, 2025

Fixes: #6226

This PR updates JSON output to the v.what module.

The JSON output looks like:

[
    {
        "coordinate": {
            "easting": 542690.40000000002,
            "northing": 204802.70000000001
        },
        "map": "hospitals",
        "mapset": "PERMANENT",
        "id": 22,
        "type": "point",
        "data": [
            {
                "layer": 1,
                "category": 22,
                "attributes": {
                    "cat": 22,
                    "OBJECTID": 22,
                    "AREA": 0,
                    "PERIMETER": 0,
                    "HLS_": 22,
                    "HLS_ID": 22,
                    "NAME": "Randolph Hospital",
                    "ADDRESS": "364 White Oak St",
                    "CITY": "Asheboro",
                    "ZIP": "27203",
                    "COUNTY": "Randolph",
                    "PHONE": "(336) 625-5151",
                    "CANCER": "yes",
                    "POLYGONID": 0,
                    "SCALE": 1,
                    "ANGLE": 1
                }
            }
        ]
    }
]

Note: For more information on the structure of the JSON with different flags, check out the test_v_what_output.py file.

This PR includes the following changes:

  1. Updates JSON output using parson instead of manual writing.
  2. Adds tests covering each of the new formats.
  3. Adds a Python example for parsing JSON output to the documentation.

Note: The old code had a somewhat clunky control flow. I have ensured that the updated JSON support matches the initial requirements. If anything was missed, please let me know.

@github-actions github-actions bot added vector Related to vector data processing Python Related code is in Python C Related code is in C libraries module docs markdown Related to markdown, markdown files tests Related to Test Suite CMake labels Aug 23, 2025
@wenzeslaus
Copy link
Member

After thinking more about this, I have a list of things to (re)consider here.

Output attributes according to their types

In the example, cat is an integer, other numbers are probably numbers in the database. SQL NULL should go out as JSON null...

Make type names all lower case

My guess is that the uppercase letter at the beginning of Point is a left over from a plain text output. I don't know that there would be any example for uppercase in type in GRASS. The parameter type has possible values point, line, boundary, centroid, area, face, kernel.

[
    {
        "coordinate": {
            "easting": 542690.40000000002,
            "northing": 204802.70000000001
        },
        "map": "hospitals",
        "mapset": "PERMANENT",
        "type": "point",
        "id": 22,
        "categories": [
        ...

No connection information

Seeing this together with JSON output of v.db.connect (#6077) made me think that we don't need to output connection information for every geometry id because you can get it from v.db.connect:

[
    {
        "coordinate": {
            "easting": 542690.40000000002,
            "northing": 204802.70000000001
        },
        "map": "hospitals",
        "mapset": "PERMANENT",
        "type": "Point",
        "id": 22,
        "categories": [
            {
                "layer": 1,
                "category": 22,
                "driver": "sqlite",
                "database": "/grassdata/nc_spm_08_grass7/PERMANENT/sqlite/sqlite.db",
                "table": "hospitals",
                "key_column": "cat",
                "attributes": {
                    "cat": "22",
                    "OBJECTID": "22",
                    "AREA": "0",
                    "PERIMETER": "0",
                    "HLS_": "22",
                    "HLS_ID": "22",
                    "NAME": "Randolph Hospital",
                    "ADDRESS": "364 White Oak St",
                    "CITY": "Asheboro",
                    "ZIP": "27203",
                    "COUNTY": "Randolph",
                    "PHONE": "(336) 625-5151",
                    "CANCER": "yes",
                    "POLYGONID": "0",
                    "SCALE": "1",
                    "ANGLE": "1"
                }
            }
        ]
    }
]

v.db.connect output with #6077 (hand-written):

[
    {
        "layer": 1,
        "name": "hospitals",
        "table": "hospitals",
        "driver": "sqlite",
        "database": "/grassdata/nc_spm_08_grass7/PERMANENT/sqlite/sqlite.db",
        "key": "cat",
    }
]

data instead categories for layer, category, and attributes

The list of dictionaries which contains the layer-category pairs and possibly attributes, is now called categories, but data is more general, better capturing both attributes and the more plain layer-category pairs.

[
    {
        "coordinate": {
            "easting": 542690.40000000002,
            "northing": 204802.70000000001
        },
        "map": "hospitals",
        "mapset": "PERMANENT",
        "type": "Point",
        "id": 22,
        "data": [
            {
                "layer": 1,
                "category": 22,
                "attributes": {
                    "cat": "22",
                    "OBJECTID": "22",
                    "AREA": "0",
                    "PERIMETER": "0",
                    "HLS_": "22",
                    "HLS_ID": "22",
                    "NAME": "Randolph Hospital",
                    "ADDRESS": "364 White Oak St",
                    "CITY": "Asheboro",
                    "ZIP": "27203",
                    "COUNTY": "Randolph",
                    "PHONE": "(336) 625-5151",
                    "CANCER": "yes",
                    "POLYGONID": "0",
                    "SCALE": "1",
                    "ANGLE": "1"
                }
            }
        ]
    }
]

While attributes would be a good name if attributes would be always present and we would figure out a different name for the actual dictionary with attributes, it would not work for the (admittedly more advanced) case when only layer-category pairs are present. On the other hand, data works just fine in that case (although the current categories is more fitting):

[
    {
        ...
        "id": 22,
        "data": [
            {
                "layer": 1,
                "category": 22,
            },
            {
                "layer": 2,
                "category": 105,
            },
            {
                "layer": 2,
                "category": 290,
            },
            {
                "layer": 3,
                "category": 1,
            }
        ]
    }
]

Switch order of id and type

The id key should come sooner. While the map and mapset make sense when multiple vector maps are in the input, id is what really identifies the specific feature (geometry) while type is only the type of the geometry, an attribute or property of sort.

[
    {
        "map": "hospitals",
        "mapset": "PERMANENT",
        "id": 22,
        "type": "Point",
        "data": [
            {
            ...

Alternative 1: Only key column from the connection information

We could leave out the connection information, except the key column information to support correct usage of key column (i.e., writing a more general code which uses variable key column instead of relying on the name always being cat).

[
    {
        "map": "hospitals",
        "mapset": "PERMANENT",
        "id": 22,
        "type": "Point",
        "data": [
            {
                "layer": 1,
                "category": 22,
                "key_column": "cat",
                "attributes": {
                    "cat": "22",
                    "OBJECTID": "22",
                    "AREA": "0",
                    "PERIMETER": "0",
                    "HLS_": "22",
                    "HLS_ID": "22",
                    "NAME": "Randolph Hospital",
                    "ADDRESS": "364 White Oak St",
                    "CITY": "Asheboro",
                    "ZIP": "27203",
                    "COUNTY": "Randolph",
                    "PHONE": "(336) 625-5151",
                    "CANCER": "yes",
                    "POLYGONID": "0",
                    "SCALE": "1",
                    "ANGLE": "1"
                }
            }
        ]
    }
]

However, we also don't include list of columns or list of columns and their types here, so that's perhaps again for a separate tool (v.db.connect and v.info now, and possibly v.db.columns in the future).

Alternative 2: Output connection information optionally

Introduce a flag to output the attribute database connection:

-d  Print topological information (debugging)
-a  Print attribute information
-g  Print the stats in shell script style
-j  Print the stats in JSON
-m  Print multiple features if overlapping features are found
-i  Print attribute database connection information  <-- New

Similarly, we could add info about the attribute column types as another convenience feature (in the future):

-d  Print topological information (debugging)
-a  Print attribute information
-g  Print the stats in shell script style
-j  Print the stats in JSON
-m  Print multiple features if overlapping features are found
-i  Print attribute database connection information  <-- New
-c  Print attribute column types information  <-- New

or:

-c  Print attribute database connection information
-t  Print attribute column types information

Note: Keep the coordinates key as is

I agree that coordinates should stay there as is. While the code really loops over different coordinates (if multiple), so that could be theoretically the first level and all features would be nested under that, going by features (geometries) is just more natural (more expected). The coordinates part of query is still relevant here because one feature can be outputted multiple times for different parts of the query (if it falls into the buffer of two different coordinates), therefor coordinates explains why the same id is included more than once. While I don't see a specific use case for it now, having the coordinates keeps the spirit of the tool. At the same time, not using it for the hierarchy makes the output readily usable for the common cases such as "what area (attribute) is at these coordinates" or "what geometry is closest to a mouse click".

In the context of coordinates, I can see two future directions (not for this PR), a completely new tool which would work on more query-result style with deduplication and on a specific layer giving significantly simpler results without any query geometry information (no coordinates key) as well as more options in v.what for the query. Notable, v.edit already has bbox and polygon using the same library functions v.what is already using (Vect_select_lines_by_polygon‎, Vect_select_areas_by_polygon‎).

Signed-off-by: Nishant Bansal <[email protected]>
Signed-off-by: Nishant Bansal <[email protected]>
@petrasovaa
Copy link
Contributor

I apologize I didn't specify this earlier but we need to keep the old format and add the new one for backwards compatibility. With j flag you would get the old format (and that would be marked deprecated) and with format=json you get the new format.

@cwhite911
Copy link
Contributor

To start I think we should go with alternative 2, this will provide backwards compatibility and remove the connection details bloat from the json.

@NishantBansal2003
Copy link
Contributor Author

I apologize I didn't specify this earlier but we need to keep the old format and add the new one for backwards compatibility. With j flag you would get the old format (and that would be marked deprecated) and with format=json you get the new format.

Yeah, I actually thought about asking but then I assumed the previous JSON output might not have been correct, which is why we weren’t going with a backward-compatible solution. But never mind, I understand now that every change should be backward-compatible. That was a mistake on my part as well.

Working on fixing it.

@wenzeslaus
Copy link
Member

I realized that the documentation needs a major update, and because it was too much to include here, I created a new PR which assumes we merge this PR: #6266. If there are any further changes, I will make sure to update #6266 after we merge this PR.

@wenzeslaus
Copy link
Member

previous JSON output might not have been correct

Right, it's just that with one coordinate pair and one map it was still okay (100% valid JSON).

@NishantBansal2003
Copy link
Contributor Author

one doubt I have is about alternative 2: is the -i flag (Print attribute database connection information) only for the new JSON format, or should I add this for all formats? (For backward compatibility, we would need to always set it to true in that case.)

@wenzeslaus
Copy link
Member

...is the -i flag only for the new JSON format, or should I add this for all formats?

Only for format=json, -j and -g should behave as if it would be enabled for backwards compatibility reasons. -j and -g can basically use the original code as is.

For the default output (and also with format=plain), I would say you can implement -i because there are no backwards compatibility concerns here (the parsable format were -g and -j).

petrasovaa
petrasovaa previously approved these changes Aug 29, 2025
Copy link
Contributor

@petrasovaa petrasovaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some small adjustments and I think it's good to go! Thank you!

Once this is in, we need to also update the querying in the GUI to switch from -j flag.

Signed-off-by: Nishant Bansal <[email protected]>
@petrasovaa petrasovaa merged commit 3d5f1c0 into OSGeo:main Sep 2, 2025
27 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in GRASS JSON Outputs Sep 2, 2025
@github-actions github-actions bot added this to the 8.5.0 milestone Sep 2, 2025
wenzeslaus added a commit to wenzeslaus/grass that referenced this pull request Sep 4, 2025
The description attribute was repeated, but it was supposed to be label. Introduced in OSGeo#6252. Found by Coverity Scan.
wenzeslaus added a commit that referenced this pull request Sep 4, 2025
The description attribute was repeated, but it was supposed to be label. Introduced in #6252. Found by Coverity Scan.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C Related code is in C CMake docs libraries markdown Related to markdown, markdown files module Python Related code is in Python tests Related to Test Suite vector Related to vector data processing

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feat] v.what needs better JSON format

4 participants