-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement access to complex columns (ARRAY, MAP, STRUCT) #20
Comments
Agreed, this would be useful, and I hope to do this soon. In most aspects the Impala SQL syntax is compatible with the HiveQL syntax used by Hive and Spark SQL, but Impala's syntax diverges when working with maps and arrays—Impala uses join syntax instead of lateral views. I think this does warrant an explanation and some examples in the readme, or possibly in a separate vignette about complex types. |
Thank you for your answer and this cool package, which definately eases my transition towards bigger data sets. :) Do I understand correctly, that the currently suggested way to access complex types would be to execute SQL-queries (via (I was hoping, that there would be some pipeable dplyr-frontend to extract some parts of a complex type, similar to |
Currently for array and map columns, yes, I think that's right. I'm doing some experimental work right now to implement access to array and map columns in a dplyr-friendly, tidyr-style way. I'll let you know when I have something ready for you to try. |
Sounds interesting and I will surely take that idea for a testdrive. Would be nice to stay within one system, even though the Impala SQL Syntax seems surprisingly accessible as well. |
I implemented support for complex columns in the development version of implyr on GitHub. This is currently an experimental feature; the implementation is somewhat hacky and there are some limitations as described below. @henningsway can you try this out on your data? First install the latest implyr from GitHub: Current limitations:
|
This sounds exciting. I will try it soon and let you know. |
I plan to implement support for arbitrarily many nested levels of complex types, but I wanted to solve the simpler case first. There are some complications and design choices I need to consider more; for example, should a single call to the |
This is an interesting question. First I thought, that repeated calling of |
I've tried to get the unnesting working for me, but I haven't yet been successful. This is the kind of code I tried
This is the kindo of error I got
I noticed, that the complex column is not represented in PS: I think it may have to do with the complex column having a complex column nested within itsself. I will try to find another example to work on soon and get back to you. |
Thanks for testing this! There was a bug in how column names were quoted. I just fixed this in 21a8557. This should resolve the error you observed. But I have not yet implemented support for multiple levels of nesting (complex columns within complex columns). Hoping to do that soon. |
Just started using |
TBD: Look at tidyverse/dbplyr#158 and test with Impala |
TBD: Investigate if this can be redesigned for consistency with tidyr's new |
As a beginner it is not immediatly clear to me, how to best use implyr to access Impala-complex types (especially maps, e.g. pull out a couple of columns and join them with the existing data-frame).
The link in the Readme is helpful (to create
dbGetQuery()
-requests), but a short example - possibly showing dplyr-logic - would be really cool as well. :)The text was updated successfully, but these errors were encountered: