Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSE Review - comments on Developing a Reduced Form Specification Chapter #78

Open
kls2177 opened this issue Jun 8, 2023 · 5 comments
Assignees

Comments

@kls2177
Copy link

kls2177 commented Jun 8, 2023

Overall, this is generally well-structured and well-paced. This is not my area of expertise, so my comments may not be that insightful. One comment about structure: The "Choosing weather variables and their forms" section consists of a long list of different variables. I suggest that this could be moved to an appendix rather than being included in the main chapter. You could consider keeping the temperature sub-section as this is likely the most commonly used variable, but this list of variables seems like more of a reference rather than part of the tutorial.

My other major general comment after reading two of the Hands-On Exercises is that it would be nice to tell the student the point of the Exercises starting in Step 1 in the first Chapter and tell them that they are going to build up a model over the course of the chapters. The point of the Exercises is not introduced until Step 2. I think a bit more of a narrative around the exercises would help students to see the big picture and help them understand why they are doing the steps they are doing (e.g. sub-selecting the US in Step 1). The narrative could also potentially run through the entire content of the tutorial in order to give more context - but this is maybe too much revision.

Other general comment about units: please use deg C not just C. This happens a few times throughout this chapter, at the very beginning and in the Hands-On Exercise, Step 2 Section.

Minor Comments:


Temperature Sub-section:

  • should note that Tmax and Tmin are daily variables.

Humidity Sub-section:

  • it is worth noting that absolute and specific humidity are very similar.

Precip Sub-section:

  • sentence starting with "However," there is a missing "can" -> "...can vary...".
  • might be worth showing a plot of the histogram of daily precipitation (wet days only). Also it might be worth noting that precipitation can be much more Gaussian when aggregated in time, e.g. monthly precip.
  • you mention a data set, HadEX2, that has data specifically for extremes. It might also be worth mentioning the definitions of extremes often used for weather and climate data (ETCCDI: http://etccdi.pacificclimate.org/list_27_indices.shtml)

SST Sub-section:

  • here you spell out MODIS, but you actually refer to MODIS (w/o spelling it out) in the NPP section above. I suggest spelling it out in the NPP section and not the SST section.

Transformation before aggregation Sub-section:

  • can you make the equations span the entire line instead of wrapping?
  • It seems that you assume that the beta coefficients do not depend on p. Would this depend on the specific analysis?

Aggregation before transformation Sub-section:

  • spatial weighting may needed in this step and could be noted here.

Bins Sub-section:

  • I don't really understand this section in terms of how it relates to the previous section, i.e., y_js = f(Tps). Is this just about creating histograms? Perhaps you could refer to the link in the section title somewhere as an example. I didn't have time to read through this paper to better understand what is going on - maybe expanding this section a bit more to show how the temperature bins relate to the predictand (e.g. mortaility) would be helpful.
  • what happened to the s time index?

Cross-Validation Sub-section:

  • you mention cross-validation several times before getting to this section. I suggest that you link to this section when you first mention it in the 1st paragraph of the polynomial section.
  • what do you mean by eye-balling? I suggest maybe rephrasing to say "interpretability".

Hands-On Exercise, Step 2:

  • title: Why is it called "Understanding the Outcomes"? Maybe just "Prepping the Demographic Data".
  • population data in the code is not global, it is usa.
  • given that you are reading in GIS-type data in this Exercise, it might be useful to refer to your upcoming GIS section.
  • for python, you need the rioxarray package now, not rasterio. Note also that you need all the files contained in the .zip folder in order for the open_rasterio command to work. I am not a GIS person, so I did not know this. You mention this in the later GIS section. Given the current sequence of the tutorial, I would almost move this Exercise to after the "Generating Geographical Unit Data" sub-section of the next Chapter.
  • my pandas tables does not look like yours. I added ".reset_index()" to the groupby commands to get each column name in its own column.
  • population dataframe -> what does 'type'=3 mean? Why do you need to carry this along if you end up dropping it in the merge?
@kls2177
Copy link
Author

kls2177 commented Jun 20, 2023

Now that I have read the entire tutorial, I have one other comment about the Hands-On Exercise, Step 2. You introduce the linear and quadratic forms of the temperature data in this section, but the actual exercise is downloading the population and mortality data. There seems to be a bit of a disconnect here. I think it would be helpful to provide a bit more narrative - let the students know that you will come back to the temperature transformations in Hands-On Exercise, Step 3. Or move Step 2 to the next Chapter, as suggested above.

@azharhsain
Copy link
Collaborator

The "Choosing weather variables and their forms" section consists of a long list of different variables. I suggest that this could be moved to an appendix rather than being included in the main chapter.

Thank you for this helpful comment. We have kept temperature and precipitation in the main text and moved other variables to a new subsection in the end

Please use deg C not just C.

Fixed.

Temperature Sub-section: should note that Tmax and Tmin are daily variables.

Noted.

Humidity Sub-section: it is worth noting that absolute and specific humidity are very similar.

Good point. Noted.

Precip Sub-section: sentence starting with "However," there is a missing "can" -> "...can vary...".

Fixed.

Precip Sub-section: might be worth showing a plot of the histogram of daily precipitation (wet days only). Also it might be worth noting that precipitation can be much more Gaussian when aggregated in time, e.g. monthly precip.

Very good idea. Added precip graphs for Illinois at daily and monthly frequency to show the more Gaussian nature of the latter. Also, added a discussion on this.

Precip Sub-section: you mention a data set, HadEX2, that has data specifically for extremes. It might also be worth mentioning the definitions of extremes often used for weather and climate data (ETCCDI: http://etccdi.pacificclimate.org/list_27_indices.shtml)

Added the link.

SST Sub-section: here you spell out MODIS, but you actually refer to MODIS (w/o spelling it out) in the NPP section above. I suggest spelling it out in the NPP section and not the SST section.

Fixed.

Transformation before aggregation Sub-section: can you make the equations span the entire line instead of wrapping?

Done.

Transformation before aggregation Sub-section: It seems that you assume that the beta coefficients do not depend on p. Would this depend on the specific analysis?

Added a discussion on this issue.

Aggregation before transformation Sub-section: spatial weighting may needed in this step and could be noted here.

Noted.

Bins Sub-section: I don't really understand this section in terms of how it relates to the previous section, i.e., y_js = f(Tps). Is this just about creating histograms? Perhaps you could refer to the link in the section title somewhere as an example. I didn't have time to read through this paper to better understand what is going on - maybe expanding this section a bit more to show how the temperature bins relate to the predictand (e.g. mortaility) would be helpful.

Changed the header reference to a simpler one, which provides definition of the binning process. Also, added more explanation on the procedure to align with rest of the functional forms' discussions.

Bins Sub-section: what happened to the s time index?

Added a point on this i.e. for notational simplicity.

Cross-Validation Sub-section: you mention cross-validation several times before getting to this section. I suggest that you link to this section when you first mention it in the 1st paragraph of the polynomial section.

Done.

Cross-Validation Sub-section: what do you mean by eye-balling? I suggest maybe rephrasing to say "interpretability".

Rephrased the sentence.

Hands-On Exercise, Step 2: title: Why is it called "Understanding the Outcomes"? Maybe just "Prepping the Demographic Data".

That seems like a good change, and we have now updated it. The original idea was to focus on the conceptual development of the data-generating-process, to fit with the other materials in the chapter. However, we agree that the main work of Step 2 is to prepare the dataset.

Hands-On Exercise, Step 2: population data in the code is not global, it is usa.

Thank you for pointing this out. We have updated the text accordingly.

Hands-On Exercise, Step 2: given that you are reading in GIS-type data in this Exercise, it might be useful to refer to your upcoming GIS section.

Added.

Hands-On Exercise, Step 2: for python, you need the rioxarray package now, not rasterio. Note also that you need all the files contained in the .zip folder in order for the open_rasterio command to work. I am not a GIS person, so I did not know this. You mention this in the later GIS section. Given the current sequence of the tutorial, I would almost move this Exercise to after the "Generating Geographical Unit Data" sub-section of the next Chapter.

We had not been aware of this update. Thank you. We have also added a statement that the other contents of the zip file from Gridded Population of the World are also needed for the .bil file to be loaded.
We have left the order of sections as it currently is with just these additional notes, to try to distribute the work of the exercise more evenly across the tutorial.

Hands-On Exercise, Step 2: my pandas tables does not look like yours. I added ".reset_index()" to the groupby commands to get each column name in its own column.

This appears to reflect updates in pandas. We have added the .reset_index function, which makes the resulting dataset easier to understand. Thank you for that update. The tables in the tutorial have been updated.

Hands-On Exercise, Step 2: population dataframe -> what does 'type'=3 mean? Why do you need to carry this along if you end up dropping it in the merge?

The population file also contains US-wide population numbers, state-wide numbers, and county-level numbers. These are identified by a column in the data with a 1 (US), 2 (state), or 3 (county). In the code below, we label this the type column and only include type = 3 data. We could drop this before writing out the result, but felt it was better to leave out another line of code.

jrising added a commit that referenced this issue Sep 5, 2023
@jrising
Copy link
Collaborator

jrising commented Sep 5, 2023

These were merged under PRs #86, #88, and #89.

@kls2177
Copy link
Author

kls2177 commented Jan 30, 2024

@azharhsain

Thanks for these updates. In the Hands-On Exercise, Step 2, there are still a few 20 C rather than 20$^{\circ}$C. Please update these.

@jrising
Copy link
Collaborator

jrising commented Mar 4, 2024

@kls2177 Good catch. We have now fixed the 20 C texts, under commit df79311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants