|
8 | 8 |
|
9 | 9 |
|
10 | 10 | # en avance: Notes from a lecture |
11 | | -## delivered by Elle O'Brien 14-October-2025 |
12 | | -### UMich School of Information |
| 11 | +## delivered by Dr. Elle O'Brien 14-October-2025; University of Michigan School of Information |
13 | 12 |
|
14 | 13 |
|
15 | | -Past org: Data Version Control |
16 | | - |
17 | | - |
18 | | -- Code LLMs (abbreviated here CL) regarding simulation, cata collection, etcetera |
| 14 | +- Code LLMs (abbreviated here CL) are in regular use: simulation, data collection, etc research software |
19 | 15 | - Undertrained as a Developer? You must be a Scientist! |
20 | | -- Debugging code you didn't write is difficult |
21 | | -- Inaccuracies in code and comments are a liability |
22 | | -- Science context: What is "testing practice"? |
23 | | -- Survey of scientists who use CL: Excerpted outcomes |
24 | | - - Who are you? Life sciences and engineering and etcetera |
25 | | - - What CL do you use? Partial: They use Chatbots not GitHub Copilot 3:1 |
26 | | - - Chatbots produce longer blocks of code (hypothesis: this increases the cognitive load on R) |
27 | | - - Use case: "language changes" (e.g. due to legacy code, changing labs, specialty tools etcetera) |
28 | | - - Chat is 1000x easier than documentation... |
29 | | - - Why use documentation? CL can read and apply it for me |
30 | | - - Testing: Ad hoc, eyeball, not systematic |
31 | | - - Unsurprising: This can easily lead to failure modes |
32 | | - - Incorrect mental models by R can lead to failure modes |
33 | | - |
34 | | - |
35 | | -The bottom line seems to be: People with experience and skill in software development, the lower the |
36 | | -"productivity boost". |
37 | | - |
38 | | - |
39 | | -As part of this process: Be aware of the Retraction Watch database. 10s-of-k retraction; compare count of papers per year: 3 million. |
40 | | - |
41 | | - |
42 | | -Potential failure mode: The quality of scientific literature slowly and quietly degrades. |
43 | | - |
44 | | - |
45 | | -Potential failure mode: Scientists stop using professional caliber scientific software. |
46 | | - |
47 | | - |
48 | | -Potential failure mode: Public trust crisis, as featured in the New York Times. |
49 | | - |
50 | | - |
51 | | -A bottom line I take away is that this is a Cautionary Tale deserving our attention and effort |
52 | | -as scientists. A grass roots approach (as suggested by the presenter) could begin with "buddy up" |
| 16 | +- Established: Debugging code you didn't write is difficult |
| 17 | +- Established: Inaccuracies in code and comments are a liability |
| 18 | +- Science context: 'What is "testing practice"?' (lack of awareness) |
| 19 | +- From a survey of scientists who use CL: Excerpted outcomes |
| 20 | + - Who are you? Life sciences and engineering and etcetera down the domain line |
| 21 | + - What CL do you use? Ratio 3 to 1: Chatbots over GitHub Copilot (coding assistants) |
| 22 | + - Consequently: As Chatbots produce longer blocks of code in comparison... |
| 23 | + - ...the null hypothesis is that Chatbot code increases cognitive load imposed on the Researcher-Developer |
| 24 | + - Under what conditions do researchers work with unfamiliar coding languages? |
| 25 | + - due to legacy code, moving between labs, domain tools and belike |
| 26 | + - How do Researcher-Developers interact with documentation? |
| 27 | + - In short: They don't. |
| 28 | + - "Chat is 1000x easier than documentation." |
| 29 | + - "Why use documentation? The CL can read it and apply it for me" |
| 30 | + - How does research code **testing** get done? |
| 31 | + - Ad hoc or 'eyeball' methods; not systematic |
| 32 | + - This can easily lead to failure modes |
| 33 | + - Another common theme is incorrect mental models... |
| 34 | + - ...that is: on the part of the Researcher-Developer |
| 35 | + - "The code is looking at an Internet-based resource..." |
| 36 | + - ...when in fact the code is not looking at the Internet |
| 37 | + - Needless to say this can produce failure modes |
| 38 | + |
| 39 | + |
| 40 | +The survey results proceeded to the relationship between CL effectiveness (perceived) in |
| 41 | +productivity in relation to facets of skill on the part of the Researcher-Developer. |
| 42 | +A summary point: People with experience and skill in software development experience a |
| 43 | +lower "productivity boost" from using a CL, even to the point of *decrease*. |
| 44 | + |
| 45 | + |
| 46 | +Turning to the scientific literature produced as a result of CL collaboration. An |
| 47 | +interesting resource is the Retraction Watch database. In one year there are currently |
| 48 | +O(10k) retractions; in comparison with 3 million published papers per year. |
| 49 | + |
| 50 | + |
| 51 | +In summary the narrative suggests the following failure modes: |
| 52 | +- The quality of scientific literature slowly and quietly degrades. |
| 53 | +- Scientists stop using professional caliber scientific software. |
| 54 | +- Poor research attribitable to CL use results in a public trust crisis, 'as featured in the New York Times'. |
| 55 | + |
| 56 | + |
| 57 | +A Cautionary Tale deserving of attention and effort: As scientists credibility is an important |
| 58 | +part of how we operate ('philosophy of doubt'). Where to begin? The speaker suggests as an |
| 59 | +example taking a grass roots approach: "Buddy up" |
53 | 60 | with an RSE. |
54 | 61 |
|
55 | 62 |
|
|
0 commit comments