Studying centered around the "Focus Morph" #181

xofm31 · 2024-03-10T16:20:02Z

xofm31
Mar 10, 2024

@mortii I think you study in a more sentence-focused way, but I study in a very word-focused way. There are a few extensions/changes to the way anki-morphs currently works to make it more ideal for the way I study. I don't know how unique I am, so I thought I'd write up how I study and what improvements I'm considering.

To describe how I study a bit, I've been using Morphman as well as other Chinese word anki decks to build my vocabulary. In my current setup, I have the Morphman "FocusMorph" field set to "Word," which is the same field that I use for my other non-Morph decks. When I am studying, I use the "L" shortcut to find all cards with the same FocusMorph, across both my Morph and non-Morph decks. When I do recalc, I want it to find the best morph for me to study, and then order the +1 sentences in by "usefulness" (to the extent that this is possible). Because I have a deck from several TV shows that has thousands of unknown morphs, and I am learning new morphs at a pretty slow rate, I rely on the Morphman Study Plan to identify the unknown morphs that are going to be most useful for me for the particular episode I'm watching, in the context of the TV series as a whole.

In order to move over to anki-morphs and keep my same flow, here are some of the things I think I'd want to / need to change. If you're not interested in these types of changes, I could keep in my own branch of anki-morphs just for myself.

Auto-populate the "Focus Morph" or "Words" field with the am-unknowns value. At first I thought that I could just rename the am-unknowns field, but there may be some cases where am-unknowns gets updated that I wouldn't want it to be updated. Since the parser makes mistakes, I sometimes need to go and fix the morph that is populated there.
Make the "browse same morphs" feature work across non-morph decks, and also have it search for whatever is in the "Focus Morph" field, regardless of whether this is a new morph or not. A quick draft of what I am thinking is here: https://github.com/xofm31/anki-morphs/blob/714f97c106599178129f0aa62e179722fe33168f/ankimorphs/browser_utils.py#L71-L92 . I think you avoided a search like this due to issues with Japanese inflections, but this is not a problem for Chinese. If you are interested in this functionality, I could probably clean it up add it as an another way to search related cards.
Difficulty calculation. I've only been thinking about this so far, but personally, I want to make the usefulness of the morph (as measured by its spot in the frequency list be the most important factor, with the difficulty of the sentence being a secondary criteria for the ordering. I also like the idea of throwing in a morph or two that are in the learning stage, to reinforce other cards that I'm currently learning. I think that there should be a way to combine the "word usefulness" and the "sentence difficulty" in a way that would allow for a parameter setting to select their relative importance. Some random thoughts:

For "sentence difficulty," for known words, rather than basing it on where they occur in the frequency file, I'm wondering if it makes sense to base it off the learning interval. If you have a pretty static card collection and frequency file, these probably track pretty well with each other. But if you are adding new cards and/or changing the frequency file, it feels like it could create a situation where you know relatively less frequent words. On the other hand, I don't know what I would do with cards that have been marked as known & don't have a learning interval. Additionally, with a Morphman-style study plan, the frequency list doesn't include already known morphs, so then they would all get the max penalty.
There was another discussion about implementing a measure of the grammatical difficulty. Not sure if you're still thinking about how to integrate that into your calculation?
I see that you only include the first 50k words from the frequency list. My master Chinese frequency list has almost 500k words. Do you limit it to 50k for specific performance/memory reasons, or just to have a reasonable cap?

I believe you're thinking about implementing a study plan. But if not, I'd probably want to do it outside of anki-morphs (I'm not a software developer, and that seems a bit too much for me). What I would do is add columns to the frequency file for the morph status (new/learning/known) and the frequency count. I was considering including a column for the frequency count for each of the txt files individually, but I'm not sure if this would be problematic if someone had hundreds or thousands of files.
Another thing I'm considering is filling in the card's fields for the word's definition and pinyin from the open source CCDICT when there is just one unknown morph.

mortii · 2024-03-11T11:10:59Z

mortii
Mar 11, 2024
Maintainer

I'm always open to new ideas that can improve the add-on :)

Auto-populate the "Focus Morph" or "Words" field with the am-unknowns value. At first I thought that I could just rename the am-unknowns field, but there may be some cases where am-unknowns gets updated that I wouldn't want it to be updated. Since the parser makes mistakes, I sometimes need to go and fix the morph that is populated there.

Wouldn't the problem you are describing still exist for those different fields too?

Make the "browse same morphs" feature work across non-morph decks, and also have it search for whatever is in the "Focus Morph" field, regardless of whether this is a new morph or not. A quick draft of what I am thinking is here: https://github.com/xofm31/anki-morphs/blob/714f97c106599178129f0aa62e179722fe33168f/ankimorphs/browser_utils.py#L71-L92 . I think you avoided a search like this due to issues with Japanese inflections, but this is not a problem for Chinese. If you are interested in this functionality, I could probably clean it up add it as an another way to search related cards.

Yeah, it's to prevent looking up cards that have different morphs even though they look the same. I'm no linguist, but Chinese is the only language I know of that doesn't have inflections, and therefore doesn't have this problem.

The only potential issue I see with adding an option to do this is that there are already many related options (only browse same unknowns, browse same morphs, browse same lemma), so it would probably have to be implemented as a toggle option instead of an ever present option, i.e. it would change the functionality of pressing "L" instead of having an additional keyboard shortcut for it.

For "sentence difficulty," for known words, rather than basing it on where they occur in the frequency file, I'm wondering if it makes sense to base it off the learning interval.

I don't think this is a great idea. The position in the frequency file/collection frequency is a fairly objective metric, whereas learning interval can have all kinds of wonky values. For example: I have sometimes in the past just given some cards a "good" review even though I forgot a complicated word on it, simply because I couldn't be bothered to deal with it again. Throw the FSRS algorithm and its re-optimize parameters feature in there, and you could get very unexpected results.

There was another discussion about implementing a measure of the grammatical difficulty. Not sure if you're still thinking about how to integrate that into your calculation?

Sure, but not in the immediate future, it would require a lot of work and testing, and it could completely crash and burn at the end of it all.

I see that you only include the first 50k words from the frequency list. My master Chinese frequency list has almost 500k words. Do you limit it to 50k for specific performance/memory reasons, or just to have a reasonable cap?

Yeah, it's an semi-arbitrary number for performance reasons. If someone is trying to learn more than the top 50K words of a language then they should probably be saved from themselves, but there might be some intricacy that makes it valid that I haven't thought about.

I believe you're thinking about implementing a study plan. But if not, I'd probably want to do it outside of anki-morphs (I'm not a software developer, and that seems a bit too much for me). What I would do is add columns to the frequency file for the morph status (new/learning/known) and the frequency count. I was considering including a column for the frequency count for each of the txt files individually, but I'm not sure if this would be problematic if someone had hundreds or thousands of files.

Adding columns/values to the frequency files isn't trivial, not only because of the size increase of the files themselves, but also the time it takes to load and read them by the addon. That said, it might not be catastrophic to add a column or two if it significantly improves the workflow/usage of the addon. I don't immediately see how that would take care of the "study plan" feature though, could you elaborate on that?

Another thing I'm considering is filling in the card's fields for the word's definition and pinyin from the open source CCDICT when there is just one unknown morph.

I think a feature like that should be out of scope for ankimorphs--an endless addition of somewhat niche features is undesirable for maintenance reasons. That could be a cool companion addon though, I'd probably use it.

0 replies

xofm31 · 2024-03-12T00:52:21Z

xofm31
Mar 12, 2024
Author

Thanks for your responses. I'll probably wait until Ankimorphs 2 comes out, and I have time to see which of these changes I want to implement. Now that I've decided that I like jieba, I might wait to see if that gets added to Ankimorphs as well, before changing over to it. It sounds like you're mostly not interested in my ideas, which is fine. Let me know if you are at some point in the future.

Wouldn't the problem you are describing still exist for those different fields too?

I'm not 100% sure on this. I'd have to check all the places where am-unknowns gets written/reset. Morphman has a separate field for the focus morph and for the unknowns, and I don't think that they are always the same thing.

The only potential issue I see with adding an option to do this is that there are already many related options

Yeah, I felt that might be a problem.

For example: I have sometimes in the past just given some cards a "good" review even though I forgot a complicated word on it, simply because I couldn't be bothered to deal with it again.

Fair enough. Out of curiosity, do you put the am-unknowns on the front of your card, or just the sentence? I focus more on the individual word, and only look at the sentence after I've remembered the word in isolation. I'm not that interested in the sentence difficulty metric anyway, since I'd prefer to think about the word usefulness.

I don't immediately see how that would take care of the "study plan" feature though, could you elaborate on that?

If I had all of the frequency counts for all of the files, I could write a standalone python script to create a new frequency list in the order needed for the study plan. My coding isn't good enough to want to put it into Ankimorphs myself. But I don't know how much it would help the add-on itself, except for people like me who are interested to look at numbers a lot.

I think a feature like that should be out of scope for ankimorphs--an endless addition of somewhat niche features is undesirable for maintenance reasons.

I understand. I am surprised that I didn't find an addon like this already. Maybe I just didn't look hard enough.

0 replies

mortii · 2024-03-12T12:22:39Z

mortii
Mar 12, 2024
Maintainer

It sounds like you're mostly not interested in my ideas, which is fine. Let me know if you are at some point in the future.

This is me being interesting in your ideas right now, hehe. If I hadn't been interested then I wouldn't even try to scrutinize them. Sorry if it came across as dismissive, but it's really important to put ideas through an antifragility process in my opinion to make the end-product is as good as possible.

I'm not 100% sure on this. I'd have to check all the places where am-unknowns gets written/reset. Morphman has a separate field for the focus morph and for the unknowns, and I don't think that they are always the same thing.

am-unknowns gets overwritten during recalc for cards that are in the "new" state.

anki-morphs/ankimorphs/recalc.py

Lines 442 to 445 in 2437d92

    
           if config_filter.extra_unknowns: 
        
               _update_unknowns_field( 
        
                   am_config, note_type_field_name_dict, note, card_unknown_morphs 
        
               )

You mentioned that you want to update those fields manually, but then you can't also have ankimorphs auto-populate them, at least not in the same way as the 'am-unknowns' field, because like you said, it gets overwritten. I'm not sure there is any good way to selectively choose which fields on which cards to auto-populate. You could maybe create an exclude list and store it in the profile folder I guess, but when recalcing you would have to check each card if it is included in that list, which would negatively affect the speed of the recalc to some extent.

Fair enough. Out of curiosity, do you put the am-unknowns on the front of your card, or just the sentence? I focus more on the individual word, and only look at the sentence after I've remembered the word in isolation. I'm not that interested in the sentence difficulty metric anyway, since I'd prefer to think about the word usefulness.

All the example cards in the guide are from my collection, so they look like this:

with the am-unknowns/focus morph on top and the sentence at the bottom.

I probably care less about the sentences than you suspect--learning unknown words with the added context of a sentence is just the best/fastest way to learn an unknown word in my experience, so sorting the sentences by difficulty is just the most efficient way to learn new words, that's all.

If I had all of the frequency counts for all of the files, I could write a standalone python script to create a new frequency list in the order needed for the study plan. My coding isn't good enough to want to put it into Ankimorphs myself. But I don't know how much it would help the add-on itself, except for people like me who are interested to look at numbers a lot.

Oh, you mean if the study plan feature isn't going to be made, then the extra columns in a frequency file would help you make a study plan yourself? Yeah, no, I definitely plan on making the study plan thing, the details just have to be ironed out and more pressing concerns have to be taken care of first.

I understand. I am surprised that I didn't find an addon like this already. Maybe I just didn't look hard enough.

I'm 95% sure I've seen the result of an addon that does this in some pre-made decks that was tailored to morphman. The decks had thousands of cards and each had a separate field for the definition of the focus morph, so I hope that wasn't done manually. The "only do it if it has one unknown morph" constraint won't apply to any addon you find ofc, so they might be of limited use if combined with ankimorphs.

0 replies

xofm31 · 2024-03-13T11:17:27Z

xofm31
Mar 13, 2024
Author

but it's really important to put ideas through an antifragility process in my opinion to make the end-product is as good as possible

I haven't heard this term before, but I totally agree. But I also realize that that trying to make a software do everything that everyone wants is a recipe for an unwieldy codebase (see: Morphman), and I know that one of your priorities is to not do that. So I expect that you will need to be selective in what you do implement.

You mentioned that you want to update those fields manually, but then you can't also have ankimorphs auto-populate them

I will need to think this through more. Thanks for the pointer to where it gets set.

learning unknown words with the added context of a sentence is just the best/fastest way to learn an unknown word in my experience

That's about the same as me. But I'm probably doing a lot fewer words/day than you (about 5), which is why I'm more focused on which words get selected.

you mean if the study plan feature isn't going to be made,

Yeah

I definitely plan on making the study plan thing

Great! I'll try to be patient. There are so many nice details with Ankimorphs that I'm looking forward to being able to switch over.

0 replies

xofm31 · 2024-03-17T22:07:17Z

xofm31
Mar 17, 2024
Author

I have sometimes in the past just given some cards a "good" review even though I forgot a complicated word on it, simply because I couldn't be bothered to deal with it again.

@mortii I am seeing what you mean about this. I was wondering why morphs are not showing up as learning in the am-highlight, when I know that I keep failing that morph. It turns out that they are on other cards, which I have given a "good" review, because I am mostly grading the am-unknowns.

I don't know how many places the morph interval is used, but I'd like to suggest a change to how the morph interval is calculated, at least for the highlighting and whether the card has a learning morph in _get_card_difficulty_and_unknowns_and_learning_status.

If the morph is in the am-unknowns field on any of the cards, the interval for that morph is that interval.
If the morph isn't in the am-unknowns field on any of the cards, then use the longest interval of any card that it's on (as it is currently done)

I think this will get around the problem that the interval of a card may not reflect the interval of all of the morphs, especially the tough ones.

1 reply

mortii Mar 18, 2024
Maintainer

If the morph is in the am-unknowns field on any of the cards, the interval for that morph is that interval.

If the morph isn't in the am-unknowns field on any of the cards, then use the longest interval of any card that it's on (as it is currently done)

I can see where you are coming from, but there are two significant problems I have with this approach:

It would lead to hidden inconsistent effects that are hard to track down for the user (and devs), which can lead to a lot of frustration and hair pulling
This will make the already way too complicated caching process explode in further complexity

xofm31 · 2024-03-18T14:05:18Z

xofm31
Mar 18, 2024
Author

So I think that what you are saying is that if there were a way to implement this that was easy to understand, and simpler, that you'd consider it? Oftentimes, the initial obvious way of implementing something isn't the optimal way (that's why software development teams have design reviews), so maybe there's a way to do this that would be acceptable. If you genuinely think that this would not make for a better user experience, that's a different question.

1 reply

mortii Mar 18, 2024
Maintainer

Sorry for my incomplete answer. Whenever I raise some concerns, in my mind I'm also implicitly saying it could potentially be added if we are able to work around/address those concerns.

This is the technical concern I have:

This will make the already way too complicated caching process explode in further complexity

This is the user experience concern I have:

It would lead to hidden inconsistent effects that are hard to track down for the user (and devs), which can lead to a lot of frustration and hair pulling

The technical concern could potentially be worked around, but It's not immediately clear to me how. The user experience concern is maybe more clear cut in how such a feature would be fundamentally problematic, but yes, if you have any counter arguments/ideas then I'm happy to take them into account/change my mind.

xofm31 · 2024-03-19T01:53:16Z

xofm31
Mar 19, 2024
Author

Thanks! Probably won't look at it for a bit, but hopefully I will eventually.

The user experience concern is maybe more clear cut in how such a feature would be fundamentally problematic

Unfortunately, pretty much any implementation of "morph interval" is problematic if there is more than one card with that morph.

1 reply

mortii Mar 19, 2024
Maintainer

Unfortunately, pretty much any implementation of "morph interval" is problematic if there is more than one card with that morph.

I think I agree with that statement, however, the current approach leads to a consistent and predictable problem. The idea you proposed might, in some ways, make it the problem twice as bad since it adds an extra layer on top of the current approach.

Maybe we are talking about two separate things, so I'll clarify and say that I think the morph interval will probably be better (more representative of the users actual knowledge) with your idea than the current one. However, whenever the user encounters scenarios where they feel that it isn't representative of their current knowledge, then they might be extremely confused as to why, and will have a lot of trouble finding a good answer for it, because of the added complexity this would introduce.

xofm31 · 2024-04-07T20:35:37Z

xofm31
Apr 7, 2024
Author

Here are some updates

Auto-populate the "Focus Morph" or "Words" field with the am-unknowns value.

I edited my own branch of the code to rename the "am-unknowns" field to "Word" and it seems to be working fine for me. Sometimes I do need to edit this field if the morphs weren't processed correctly, but only when I'm studying it, so then the card is no longer new. I don't need to have another field for it.

But it would make some other things that I'd like to do easier if the focus_morph were one of the slots in AnkiCardData. Have you considered this? I am not sure if this would require it to be one of the fields on the cards (as opposed to being an optional one right now), but it was required in Morphman.

I'd like to suggest a change to how the morph interval is calculated

I now have this working in the way that I want it to now. Basically, rather than building a table as a list of all intervals from all of the morphs from all of the cards, I make a dict of the morphs and keep track of the longest interval for it when it's a focus morph (matches what's in the am-unknowns field), and when it's not a focus morph.

            for morph in card_data.morphs:
                key = (morph.lemma, morph.inflection)
                if key not in morph_table_dict:
                    morph_table_dict[key] = {"focus": 0, "nonfocus": 0}
                if morph.lemma==card_data.focus_morph:
                    morph_table_dict[key]['focus'] = max(morph_table_dict[key]['focus'], highest_interval)
                else:
                    morph_table_dict[key]['nonfocus'] = max(morph_table_dict[key]['nonfocus'], highest_interval)

Then after all the cards have been read, it writes out the table, selecting the focus_morph interval if it exists.

    for key, value in morph_table_dict.items():
        lemma = key[0]
        inflection = key[1]
        if morph_table_dict[key]['focus'] > 0:
            interval = morph_table_dict[key]['focus']
        else:
            interval = morph_table_dict[key]['nonfocus']
        morph_table_data.append(
            {
                "lemma": lemma,
                "inflection": inflection,
                "highest_learning_interval": interval,
            }
        )

This way of calculating it is really best for me, because I ONLY grade the card on whether I know the focus morph. When I look at the morphs that went from "known" from the original code to "learning" in this version, they make sense to me. Also, in order to do this, I had to find a way to get card_data.focus_morph, which is why I'm hoping you'll agree that it could be a full-fledged slot.

3 replies

mortii Apr 8, 2024
Maintainer

Could you implement this in a way that it would be optional? If this feature were opt-in then that we wouldn't have to to make the unknowns extra field mandatory for everyone.

xofm31 Apr 8, 2024
Author

Taking a look at it, I think it would be possible to make it "optional", and not populate that field if it doesn't exist. It might be tougher to do it without including it in ankimorphs_profile_settings.json, in which case it would cause the user to have to remake their profile. If you think that's a problem, I'll have to look for another way.

xofm31 Apr 8, 2024
Author

On second thought, since the challenge I'm encountering is how to keep track of the field number based on the field name, I will wait to see what you do with #208. That may get around the need for the field in the profile settings.

mortii · 2024-04-16T11:59:06Z

mortii
Apr 16, 2024
Maintainer

Given that we haven't landed on a new name for the "focus morph" field, this will be a messy combination of field names and lookups, sorry about that.

Anyway, I was thinking we could add an option like morph_interval_based_on_focus_morph or something. So you would check:

am_config: AnkiMorphsConfig = AnkiMorphsConfig()

if am_config.morph_interval_based_on_focus_morph:
	# do things

But implementing the the production ready option can be done later, just add a if True: statement or something for now to simulate it.

In the the create_card_data_dict function you could add:

  # returns -1 if not found
  focus_morph_field_index: int = existing_field_names.index(
      ankimorphs_globals.EXTRA_FIELD_UNKNOWNS
  )

  for anki_row_data in _get_anki_data(am_config, note_type_id, tags).values():
      card_data = AnkiCardData(
          am_config=am_config,
          tag_manager=tag_manager,
          note_type_id=note_type_id,
          expression_field_index=field_index,
          focus_morph_field_index=focus_morph_field_index,
          anki_row_data=anki_row_data,
      )
      card_data_dict[anki_row_data.card_id] = card_data

  return card_data_dict

And the AnkiCardData __init__ function could be something like this:

    def __init__(  # pylint:disable=too-many-arguments
        self,
        am_config: AnkiMorphsConfig,
        tag_manager: TagManager,
        note_type_id: NotetypeId,
        expression_field_index: int,
        focus_morph_field_index: int,
        anki_row_data: AnkiDBRowData,
    ) -> None:
        fields_list = anki.utils.split_fields(anki_row_data.note_fields)

        expression_field = fields_list[expression_field_index]

        if focus_morph_field_index != -1:
            focus_morph = fields_list[focus_morph_field_index]
        else:
            focus_morph = None

        # ---Snipped---
        
        self.expression = expression
        self.focus_morph = focus_morph

0 replies

mortii · 2024-04-18T12:16:12Z

mortii
Apr 18, 2024
Maintainer

I think there is a fundamental problem with this approach of extracting the morph(s) from the am-study-morphs field.

The output to the extra field will only contain the inflection or the lemma, which makes it lossy, and reversing the process isn't doable because the user could have switched between using lemmas or inflections any number of times in the past. This means this will only really work for Chinese.

To make this work we would either have to change how the morph is stored and/or change the retrieval process.

Another problem is obviously that there could be multiple morphs in the field, but they are comma separated, so that is very solvable.

0 replies

mortii · 2024-10-13T12:01:38Z

mortii
Oct 13, 2024
Maintainer

Not adding any new features, apologies.

0 replies

Studying centered around the "Focus Morph" #181

Uh oh!

Uh oh!

xofm31 Mar 10, 2024

Replies: 11 comments · 6 replies

Uh oh!

mortii Mar 11, 2024 Maintainer

Uh oh!

xofm31 Mar 12, 2024 Author

Uh oh!

Uh oh!

mortii Mar 12, 2024 Maintainer

Uh oh!

xofm31 Mar 13, 2024 Author

Uh oh!

xofm31 Mar 17, 2024 Author

Uh oh!

mortii Mar 18, 2024 Maintainer

Uh oh!

xofm31 Mar 18, 2024 Author

Uh oh!

mortii Mar 18, 2024 Maintainer

Uh oh!

xofm31 Mar 19, 2024 Author

Uh oh!

mortii Mar 19, 2024 Maintainer

Uh oh!

xofm31 Apr 7, 2024 Author

Uh oh!

mortii Apr 8, 2024 Maintainer

Uh oh!

xofm31 Apr 8, 2024 Author

Uh oh!

xofm31 Apr 8, 2024 Author

Uh oh!

mortii Apr 16, 2024 Maintainer

Uh oh!

Uh oh!

mortii Apr 18, 2024 Maintainer

Uh oh!

mortii Oct 13, 2024 Maintainer

xofm31
Mar 10, 2024

Replies: 11 comments 6 replies

mortii
Mar 11, 2024
Maintainer

xofm31
Mar 12, 2024
Author

mortii
Mar 12, 2024
Maintainer

xofm31
Mar 13, 2024
Author

xofm31
Mar 17, 2024
Author

mortii Mar 18, 2024
Maintainer

xofm31
Mar 18, 2024
Author

mortii Mar 18, 2024
Maintainer

xofm31
Mar 19, 2024
Author

mortii Mar 19, 2024
Maintainer

xofm31
Apr 7, 2024
Author

mortii Apr 8, 2024
Maintainer

xofm31 Apr 8, 2024
Author

xofm31 Apr 8, 2024
Author

mortii
Apr 16, 2024
Maintainer

mortii
Apr 18, 2024
Maintainer

mortii
Oct 13, 2024
Maintainer