-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganise sections? #69
Comments
Just noticed that some of this may already have been addressed in #65 |
We should probably merge that, we were waiting for Paige review, than we can use it at starting point for further changes, the structure can definitely be improved and as you said as we add content some of the initial structure might not make sense anymore. |
Now that I'm reading this more carefully, formatting was addressing inconsistencies at notebook level not such a restricting, I like the idea of restructuring and I'm also finding difficult to separate some of the new sections from the data storage and the computation part. I'll like to have a go at a draft roughly following Dougie suggestions in a separate branch so we can then share it at the meeting on Thursday otherwise it might be tricky to visualise this. |
After starting the process I'm wondering if we should have 3 broad categories Data structure including:
Analysis/computations including:
** Platforms/tools** basically anything that defines a working environment, the platform available, the intake, pangeo, etc data collections, the packages and pre-defined software environments. Including:
Resources section to close, as it is but making sure it includes all the materials we are listing elsewhere, there's already a section for example workflows in there. As said before I'm trying to get an example of this before Thursday, as it might help us visualise the final product and make it easier moving sections around |
Thanks @paolap. Yes, it could be more extensible to have the 3 categories you suggest. One thing I notice with your new proposal is that some highly-related sections are now quite separated (e.g. analysis tasks and software environments). So we'd want to make sure we link things clearly in the text. Also it's not clear to me what constitutes a "tool". For example, the section on dask sits in "Analysis/computations" in your proposal, not in "Platforms/tools". Maybe we can come up with some clear definitions to help future contributors? The big difference I see between my and your suggestions is that mine includes a "platforms/tools" subsection within each of the "data" and "computation" sections, whereas yours breaks this out into a new section. I see pros and cons to both. Perhaps we could see what others think in our upcoming meeting? |
Yes more clarity would be great, it is probably the terms I'm using that are making the two approaches look more different than they are. Also my approach is constantly shifting the more I try to fit what we got so far into some sort of structure. So what I'm currently trialling is a bit different from what I've written which stemmed from an attempt to apply your suggestions :-) The "tools" section as it is currently it is meant as a list of useful software that can be linked (mostly they're in a glossary form) from other part of the books. So while there are some comparisons between software falling in the same category, there are not actual examples on how to use any of them. Similarly chunking appears in data format as an introduction to the topic, but then it will be expanded/demonstrated in the computations (as Scott basically as already done in his notebook) and in the dask examples section. It will take a while to find a good structure, I'm aware that the changes I'm trying to get together in a separate branch might not work, and will end up in a potential waste of time, but I'm finding really hard to think of a different structure without actually moving files around or even sections of text from one file to another. |
Great - thanks for having a stab at something. I think that's a great way to start and we can iterate from there if we want to |
I really like the ideas here for reorganizing this book! Thank you @dougiesquire and @paolap for pushing these ideas forward! I think it will be easier for me (and others) to give feedback if we can see the updates that @paolap is making in the book, so I'll hold off on comments until then. Thanks for getting a working example of this going @paolap! |
Thanks for pushing your restructured branch @paolap > https://github.com/ACDguide/BigData/tree/restructure_paola Can we recap what we think the next steps are? A further discussion of the proposed new structure here in this issue #69 ? |
Steps from here are:
One thing we all seems to agree is that we need clear overviews at the start of the book and at the start of each chapter. So potential users can come up to speed.
Might be nice to show were possible two approaches for examples we are showing, one maybe less efficient but simpler to adopt, and more advanced example for experienced users. Other comments on building a book and my branch that I sent via email: https://github.com/pabloinsente/jupyter-book-tutorial There’s a few warnings that will pop up the first time you build the book, they can usually be ignored, subsequently warnings are repeated only for the files you actually modified. rm -rf _build/ Finally, I tried not to remove any content, just move it around, but I might have missed out something and there are a few things I added:
|
@paolap - FYI just subbed a trivial PR for typos, largely as a test of my GitNoob skills PRing from a patched non-main branch in a fork. |
Just noting that I am planning to have a stab at a reorganised structure early next week (probably building off @paolap's branch) - sorry for the delay in getting to this! |
I started going through and trying to reorganise according to my initial comment in this issue and now I feel your pain @paolap! I'm not sure it's sensible for us to all try and reorganise the book, as this is very fiddly/time-consuming and it will be very difficult to consolidate our attempts. Instead, perhaps it's more feasible for us to all go through the current I think the structure has improved substantially since opening this issue thanks to @paolap's effort. My notes are:
I think this restructure exercise will be most effective/easiest if we can get multiple people comparing what works or doesn’t for them. @hot007, @paigem, @Thomas-Moore-Creative, @AliciaTak, might you guys have time to make your own notes prior to our next meeting on Sept 8th (your notes might simply be "I like it exactly as it is", which would be great). |
Okay, I'm going to have a crack too, but I don't think I'm familiar enough with all the content to have strong preferences, but based on a read through of what we've got here's my thoughts.
|
Just a note on this: "Data chunking again, this is best done as a concepts page - needs fleshing out and spell checking but that's okay. I don't think we should do the stdev/min/max/%ile etc stuff here, I don't think it adds anything - I would remove "common tasks" onward." these page and the time one were generated as copies of the computations one, some of the content was left there as an example of how to format in the same way as the original notebook, no content so far is relevant. |
Hah, that explains a lot! Please ignore me then :D |
@dougiesquire your (and everyone who's contributed) restructure looks great!! Excited to discuss it more at our meeting today. |
Hi all. @paolap and I had a play about on Miro as we discussed in our last meeting. It looks like it could be a handy tool for visualising the book structure and planning any reorganisation. I've set-up key levels of the current book structure as a "sitemap". I think this works quite well as we can add notes, tags and assignees to each level. I've also had a first stab at a reorganised structure. If you want to check them out before our next meeting, let me or @paolap know and we can email you the Miro link. Otherwise, hopefully we can use Miro to collectively arrive at a good structure in our next meeting! |
I've been doing a full read-through of the book to try and decide where a new section on "recommendations when using conda" might best fit. I'm wondering if some reorganisation of sections might help users and contributors. I've had a stab at an example draft outline below to start some discussion.
Please note that I'm not wedded to this proposal at all, but I thought I should mention to the team that as the current structure evolves, I'm starting to find it difficult to know where to look for specific things.
Overview - as is, but add table-of-contents providing details of what each section aims to do.
Introduction - new section pulling info from a few existing sections and providing context for what's to come. Very high-level concepts like "there are lot's of computation and storage resources in Australia", "for Big Data, data storage and compute must be considered together", "for Big Data it's best to utilise compute close to the data"... Include some of the list of platforms here (https://acdguide.github.io/BigData/platforms/platforms-intro.html), but save the "analysis environments" (e.g. OOD, ARE) for the Computation section below
Data storage - overview taken from https://acdguide.github.io/BigData/data_storage.html#about-large-scale-data
Computation - generic overview taken from https://acdguide.github.io/BigData/computations.html#general-tools, https://acdguide.github.io/BigData/computations.html#command-line-tools and https://acdguide.github.io/BigData/computations.html#other-languages-matlab-r-etc
Resources - taken from https://acdguide.github.io/BigData/resources.html#resources
I think this includes all the existing sections/information (and adds a few). I'm not sure about the division into "Data storage" and "Computation" sections because they're so heavily related. I'm interested to hear what people think (@paigem, @hot007, @paolap). Please do feel free to tell me that this is not needed.
P.S. There are also a number of "hints" and "asides" scattered throughout the book. I wonder if putting these boxes would help with clarity (e.g. https://acdguide.github.io/BigData/accessing_data.html#working-with-authorised-catalogues)?
The text was updated successfully, but these errors were encountered: