- Record githash for lean and mathlib and lean github/version. Put in raw_data
- Put lean running inside context manager. Hopefully this will prevent runaway lean processes if python is closed early.
- Save all lean files in paths. This way we also save any modifications to the base lean files.
- Save files first. We want to do all easy stuff before running the long script which may fail.
- Figure out how to get lean files so as not to modifying elan toolchains
- Manually
- Automatically
- Make full workflow
- Manually
- (More) Automatically
- Make seperate process to break up the traces. For each trace, add filename and trace position.
- Make class for accessing raw_data
- Figure out how to load data faster (or at least by file so I'm not waiting as long)
- I could do the initial read, break into files, and then do the full pipeline for each file seperately?
- Run parser over file which matches parentheses and other things. Get it to run on all of mathlib.
- Cases:
- Matching
()
,[]
,begin...end
,match...end
, etc - Seperators
,
,;
,|
, etc- Turn on/off as needed. match
|
at outermost level and insidematch...end
only
- Turn on/off as needed. match
- Commands are seperators at outermost level and inside
namespace...end
,open...end
- Matching
- Groupsings: e.g.
['(', ['big', 'apple'], ',', 'orange', ')']
- Add better types to parser
- Add better position recording
- Cases:
- Combine all tactics with same (line, column, depth) and use that to do tactic recording. Then ignore index.
- Use parsing to find matching
{
for tactic symbol}
. This way can avoid cases with no children. - Test on mathlib
- Fix known issues:
-
...;[...]
- trailing commas
- dead tactics
-
<|>
-
- Add more checks
- Check that tactic strings don't subsume non-decendent tactics
- Check that blocks beginning in
begin
end withend
or, end
- Seperate out as its own thing
- Record if tactic contributes to solving goal
- Maybe do more with finding straight path through the tree
- record tactic state as string before and after
- maybe record which goals have changed? Can I hash goals? Or just compare metavariable ids?
- record environment hash
- Show possibility for more detailed tactic state recording
- make simple tactic prediction dataset
- make script to modify
parse
anditactic
in parameters- Improve parser to get relevant information
- use to find all tactic arguments
- compare to other tactic string methods