Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep a "parent" pointer when loading deeply nested data #792

Open
hoopes opened this issue Jan 3, 2025 · 3 comments
Open

Keep a "parent" pointer when loading deeply nested data #792

hoopes opened this issue Jan 3, 2025 · 3 comments

Comments

@hoopes
Copy link

hoopes commented Jan 3, 2025

Description

I'm not sure how feasible/possible this is, but i wonder if there is a way to keep track of an objects "parent"? I have huge files of deeply nested json - occasionally, it would be hugely beneficial to know exactly where an object lives in the data. One way to do this would be to track "up" the tree, until we can orient ourselves.

Perhaps there is a better way to do this? Or does the functionality exist today to accomplish this? I've attempted to experiment with some awfully hacky ways to get this done, but haven't come up with anything worthwhile yet. Any advice is greatly appreciated.

Thanks!

@rafalkrupinski
Copy link

That depends on how you process your data.

What you're essentially doing is

root['field']['other-field']['etc']

Nothing is stopping you from storing the parent

parent = root['field']['other-field']

@hoopes
Copy link
Author

hoopes commented Jan 4, 2025

I guess I could be more specific here :)

Imagine we have a json file test.json with the content

{
  "name": "Parent 1",
  "children": [
   {"name": "Child 1"},
   {"name": "Child 2"}
  ]
}

And the code to load this json file into Struct objects.

from pathlib import Path
import msgspec

class Child(msgspec.Struct):     
    """Child."""
    name: str

class Parent(msgspec.Struct):     
    """Parent."""
    name: str
    children: list[Child]

json_data = Path('./test.json').read_text()
root = msgspec.json.Decoder(Parent).decode(json_data)     
child = root.children[0]

Now I have a reference to a child object - I'd love to be able, with that child object in hand, to trace my way back to the root of the tree, or have a way to kind of orient myself within the greater body of data. A bonus would be, given that child, to know the index in the list it lives in, or the key of the dict it lives in.

These are probably farfetched asks, so if this is obviously beyond what the library author intends to support, it's fine - I just thought I'd ask if there's a way to support this type of thing now. Perhaps during the decode, as we reach each object to parse the json data, we have some sort of information about where we are during the decode, and the user could be responsible for storing that data however they see fit.

@rafalkrupinski
Copy link

rafalkrupinski commented Jan 5, 2025

Your structure is still non-recursive so it's only able to represent a two-level diagram of a single Parent instance with a single level of Child instances :)

Take for example JSON Schema, simplified to object properties:

class Schema:
  properties: Mapping[str, Schema]

def traverse(current: Schema, stack: list[str]):
  for name, child in current.properties.items():
    traverse(child, [*stack, name])

Here you can trace your path from root using property names from root, or parent objects from the other end.
Your stack can be any structure, like list[tuple[Parent, int]].

This is rather off-topic btw

edited: added list[] to the stack structure to clarify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants