-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
challenges with readability and application logic with interactivity #35
Comments
and I see a 👍 from @kafonek, please feel free to add your thoughts / updates here too! |
Thx for opening this one as an issue @lheagy! The Voilà/dashboard interaction is going to require a bit of thought, as it shows quite well the tension between linear execution flows with clear stopping points (where a solution like |
A few thoughts to start the conversation til we have a chance to iterate in person in an upcoming meeting... @lheagy and I brainstormed a little on this, and we wondered if thinking more explicitly about the execution graph implicit in a notebook would be the right path. A notebook has, implicilty, a linear graph where each cell is a node, and to first order each cell depends on its parent (though this dependency may not be real if the code in two given cells commutes). The hard part about this issue is that, in "normal" notebook usage, the user has explicit, manual control over the state of this execution graph, and is responsible for keeping it consistent without seeing the actual dependency graph. But a dashboard view only shows a few nodes in the graph (those cells meant for display/manipulation), yet it's possible that these nodes depend on others to be consistent/meaningful. This is where the "callback hell" comes in: the dashboard author is responsible for ensuring that any action those visible nodes can take is always consistent, and re-executes whatever is needed on the hidden nodes to avoid inconsistencies. So perhaps a useful path ahead would be to expose in, say a table of contents-like sidebar, a structure of these dependencies in the graph, flagging the visible and hidden parts (if to be shown as a dashboard), and letting the user know when a cell may require re-execution of others, or is a pre-requisite for more to be loaded later (say a filename is needed to move forward). So perhaps before we go into full Voilà/dashboard questions, we can consider as a starting case a notebook with a few cells, some that only need to be run once (say imports), then one that is a dependency for the rest, then a widget that uses this for producing an outupt. That simple 3-part structure can then be shown on a sidebar as a graph, and we can explore: a) what UI/UX would help users manage that workflow, even if it's done as a normal notebook, no dashboard in sight. To reduce the need for as much callback management, we might imagine in this kind of setup, having the ability to automatically wire groups of cells to always execute together, or execute when their dependencies change, so the user doesn't have to encode all that logic explicitly in callbacks managed by widget controls. This is just some initial thinking to get us going, curious to hear what @kafonek thinks as a starting point, plus the rest of your thoughts/notes as we know you've been working on this quite a bit... |
I work with @kafonek so I'll share a few things -- but we haven't come up with any great solutions yet. We've mentioned this in previous discussions with you, but as background for other readers, one of the things that gives us the most trouble in our organization is that we are very rarely working from a static dataset, so almost every notebook uses widgets right up front to build a query into a remote API. I think this makes our callback situation worse than a lot of the dashboard demos we see in the community -- since Voila executes the whole notebook right away, but we need input from the user before we can do anything useful, we end up with a lot of widget code and almost everything else in callbacks. @kafonek explored papermill-voila integration a bit in notebook_restified, where you split out the widget callback hell into a separate "view" notebook that then calls a parameterized "model" notebook containing linear execution logic. In notebook_restified, the "model" notebook is treated as a function that returns a result at the end. As a variation on that theme, I was thinking about cases where the model/view separation is less clean, like if the target notebook has its own visualizations and widgets. That led to this very rough voila-papermill runner, which basically auto-generates a widget GUI for a parameterized notebook and runs it as a dashboard within a dashboard. While we do have some notebook authors that are really into making beautiful widget GUIs, I think an "auto-GUI" would appeal to quite a few as well. Obviously, implementing it as a notebook running another notebook is not ideal, and there's been a little discussion of more formal support for this kind of thing in Voila. I'm definitely intrigued by your idea of wiring cells together in an execution graph. I think there would be a lot of potential if you could keep most of your logic linear at global scope, and then your widget callbacks could be as simple as "execute the train-model cell." Would this be able to support something like "don't run cell x until you get value y from the user"? Do you foresee making information about the notebook itself (e.g. cell id/name) available to the kernel or are you thinking it would be some other mechanism? (I hesitate to share this because this approach ended up being too clunky, but I tried something along these lines here.) |
@fperez https://github.com/jupytercalpoly/reactivepy was a project by the calpoly interns a few years ago to automatically detect this 'dependency graph' and re-execute when necessary. Might be worth looking into. |
@lheagy, I'm afraid I don't have much to say beyond what I discussed with you and @fperez and @choldgraf just before the pandemic (so roughly ten years ago). @jeffyjefflabs thanks for posting your write up. I am glad Fernando made the distinction between Notebooks/workflows that have clear beginings and endings versus those that are designed to show you an interactive visualization. Our struggle is entirely with the former; UI code is the point in the latter, of course it should be front and center. Using Widgets just to gather user input / parameters and then running a linear workflow is distracting and problematic. We see "the widget issue" acutely because of our environment not because of any innate problem with ipywidgets itself. For instance, it is hard/impossible to create widely-accessible intermediate data sets so most of our Notebooks have to run the gamut from querying raw data, cleaning it up, and then visualizing it -- that makes for very long/complicated code set inside callbacks. If we were just formatting/visualizing clean data, Widgets would be awesome. Savvy coders also can't just run Notebooks on behalf of a user and deliver results because they do not have the same access/authorities, when it is not outright forbidden by raw-data-sharing policy. Our most used/most popular Notebooks are often transformed into Widget-based UI's so that they appeal to users who are uncomfortable looking at code. When new Jovyans begin to write code themselves, they tend to look at the most popular Notebooks for style inspiration. I recently gave a talk internally called "I don't like Widgets" where I urged up-and-coming Pythonistas to think hard about the trade-offs of using Widgets in their own Notebooks. @somedave has cajoled me into recreating a few memes from that talk here for your enjoyment.
I worry that from an authors perspective, using Widgets hamstrings the process of exploratory data analysis; makes maintaining or handing off a Notebook to other developers difficult; and is a debugging nightmare. I also worry that hiding code, or structuring it in a way to be incoherent, is smothering the spark of curiosity that lures our business analysts towards self-teaching programming. Moving code out of a Notebook and into a package is one good way to clean up a Notebook. I am fully supportive of that if the package is documented well, and especially if the code is generalizable and re-usable. However the point is still to make code tell a story of how something is being done, and to let users explore within that story. Some of our dev teams are maintaining two copies of Notebooks -- one that is modeled for EDA / debug, and one that serves out a UI. Treating a Notebook like a function with parametizable input variables and a return value, the As the last meme suggests, I am still interested in Rich Input. I apologize for not putting any real life force into creating a JEP on the topic. Has anyone else pushed that thread @fperez ? |
I see @kafonek conveniently left out two additional memes used at the end of his "I don't like Widgets" presentation: and But in all seriousness, we've seen in our use-case the multiplying power from user friendly widget-based notebooks that greatly increase the size of our Jupyter user community. As I highlighted in my recent JupyterCon presentation, we have over 12,000 unique users of Jupyter notebooks even though only 2000 of those are authoring notebooks - the rest are simply running existing notebooks. The more user-friendly the experience is to run those notebooks, the more likely it is we'll keep growing the size of that community. Some of our authors have gotten very creative with how they tailor their notebooks for use by code-shy analysts. I know I have shared this with you all before @choldgraf @lheagy @fperez, but I thought including it here for others who read this ticket would be helpful: this example shows one of our more advanced widget-based UI notebooks (although this instance of the notebook is simply a mock-up and it won't perform any actions beyond seeing the interactive GUI). To @kafonek's point - this notebook and others like it tend to be some of the first notebooks that new users see, and so it's natural that they would look to learn from this notebook how to write their own. But clearly the level of widget code used is overwhelming to any level of user, and it would be extremely difficult to learn the application logic section of the code not to mention all of the other challenges Matt outlined above. I was really excited to see this topic brought up in this ticket and would love to work to identify a solution here. |
I wanted to flag a few notes from our meeting with @kafonek about a year ago at this point just so we have them noted for the continued discussion: Audience
challenge
Possible solutions
General conversation on introspection/global scope in Notebooks
|
Thanks for the conversation yesterday; I thought it might be useful to summarize here as well. I think we talked about four general strategies:
Let me know if I missed anything! |
Thanks @jeffyjefflabs! One addition that follows from @choldgraf's comments in the conversation is also the possible role of JupyterBook and Thebe to generate interactive pages from notebooks that can be run in a step-by-step matter |
@jeffyjefflabs those look great to me. As @lheagy mentioned I think there is an interesting opportunity to explore using something like Jupyter Book (or Sphinx, which Jupyter Book uses) to generate pages that could control the flow of commands etc as described in the meeting. I'm not sure exactly what the pattern would look like but it seems promising as long as you're OK with not having everything inside the notebook interface. If you wanna get some inspiration for how this kind of thing could be done, see these examples:
|
As a bit of a next step in this space, would it be helpful to have a few concrete examples of the style of workflows we want to be able to make progress on? Here are a couple examples / categories that I that I have in mind (each with a binder link):
Are any of these good analogs to the types of workflows that are challenging? |
Yes, definitely some familiar patterns from our perspective! For the code-heavy style, DC-1d-smooth-inversion-code.ipynb is a perfect example. I like how the user parameters are collected into a cell near the top -- we're not always that clean about it, but same idea. The big difference is that this notebook can set reasonable defaults for everything (e.g. pick one of several csv files), whereas most of our notebooks have no suitable defaults and the rest of the notebook is effectively "undefined" -- best case a "Run All" would crash with no side effects; worst case it would send an invalid or non-compliant query into some other system. So that's where our blocking challenges come in -- either we split the notebook into two after that parameter cell (notebook_restified), or use ipython_blocking (which eliminates Voila as a productization option), or wrap the rest of the notebook in widget callbacks. For the "app" style, we haven't seen a lot of our users moving code into modules like this, but to achieve roughly the same effect, we have seen heavy use of a javascript/css hack to hide code cells and make "informal" dashboards. We also have a fair number of notebooks written for the now-retired jupyter-dashboards extension, which fortunately hasn't stopped working yet (I think we've made one small patch to it, and we're watching voila-gridstack as an eventual migration option). |
A few more questions to think through this a bit: User workflowHow comfortable are users with the shift+enter or run button from the toolbar to execute things sequentially? For example, if a there was an easier access run button (next to the cell or similar) would that simplify the user interaction at all? we could maybe imagine streamlining the running of multiple cells by allowing a sequence of cells under one heading to be run together (maybe with something next to the heading)? or is a "Run All" workflow pretty critical? JupyterBook + Thebe as a dashboard option?I would be curious to hear thoughts on JupyterBook + Thebe as a potential option for sequential workflows. There is an example page that uses widgets here: https://pangeo-data.github.io/jupyter-earth/jupyter-resources/ecosystem/widgets.html. You can also hide cells with Jupyter Book. This currently leads to a bit of a non-intuitive interaction (in the page I linked, the imports for example are hidden -- so if you just run things in order, it fails, but one potential route forward would be to add some logic to JupyterBook + Thebe to run cells above if there are some hidden ones) Developer behavioursA couple of the features that I like about some combination of the above points is that:
What are some of the missing components?
|
@lheagy just a note that you can tag cells to auto-execute when thebe is initializd: https://jupyterbook.org/interactive/launchbuttons.html#running-cells-in-thebe-when-it-is-initialized |
Ah, thanks @choldgraf! (just needed to finish reading the docs ;) |
I came across bamboolib today, which seems to make aggressive use of widget-like machinery (I only saw their videos, didn't play with the tool), but still seems to produce code under the hood that would then allow for linear execution. The workflow in the demo video is in line with this discussion, flagging it here for reference in case it helps us think through. |
Notes for myselfI've discussed this with @fperez and wanted to keep track of some notes related to this issue. A key challenge relates to blocking at cell execution while having emitted widgets that the user wants to interact with etc, and then unblock....
|
Related to the dashboards work in #15, the discourse post from @kafonek on Thoughts ad Experiences from using Jupyter in Enterprise, they discuss some of the challenges in generating interactive workflows.
This is something we also encountered in generating some of the geosci notebooks. We parsed out the code into modules that are imported to hide that from the user, but that doesn't solve the problem of allowing that code to be an avenue for folks who are curious to learn about the implementation.
They proposed a couple early-stage solutions:
It would be interesting to see if this is something we can make some progress on.
@alicecima, @EMscience: you both have been working on more interactive workflows, what are your experiences?
The text was updated successfully, but these errors were encountered: