Filtering a df based on a json col #4234
Replies: 2 comments
-
|
You might try extracting as text using |
Beta Was this translation helpful? Give feedback.
-
|
Hi @mamdouhmubarak, currently our JSON support is rather limited and don't support filtering operations. As rchowell mentioned, you would likely need a combination of example: df = daft.from_pydict({
"json": ['{"my_key": "bar"}', '{"my_key": "baz"}', '{"my_key": "foo"}']
})
df.filter(
daft.col("json").json.query(".my_key")
.str.replace('"', '')
.is_in(["foo", "bar"])
).collect()If that doesn't work for you, you can also create a udf and manually inspect the json. df = daft.from_pydict({
"json": ['{"my_key": "bar"}', '{"my_key": "baz"}', '{"my_key": "foo"}']
})
def json_row_matches(value: str, key_to_match_on: str, match_values: list) -> bool:
import json
data = json.loads(value)
return data[key_to_match_on] in match_values
@daft.udf(return_dtype=daft.DataType.bool())
def json_key_matches(s: daft.Series, key, values):
return [json_row_matches(value, key, values) for value in s.to_pylist()]
df.filter(json_key_matches(daft.col("json"), key="my_key", values=["foo", "bar"])).collect() |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I’m working with a DataFrame that has a column containing JSON objects. Some of the values in that column are None. I’d like to filter the rows where a specific key in the JSON matches any value from a given list. What’s the best way to do that?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions