-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update develop from master for v3.7 #13011
Closed
adrianeboyd
wants to merge
33
commits into
explosion:develop
from
adrianeboyd:chore/update-develop-from-master-v3.7-1
Closed
Update develop from master for v3.7 #13011
adrianeboyd
wants to merge
33
commits into
explosion:develop
from
adrianeboyd:chore/update-develop-from-master-v3.7-1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Update most recent release * Switch from azure to GHA CI tests badge * Remove link to survey * Format
* initial commit * update for v0.4.0 * Apply suggestions from code review * Fix formatting * Apply suggestions from code review * Update website/docs/api/large-language-models.mdx * Update website/docs/api/large-language-models.mdx * update usage page * Apply suggestions from review * Apply suggestions from review * fix links * fix relative links * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from review * Add section on Llama 2. Format. --------- Co-authored-by: Raphael Mitsch <[email protected]> Co-authored-by: Sofie Van Landeghem <[email protected]>
Additionally remove outdated `is_new_osx` check and settings.
* Added OdyCy to spaCy Universe * Replaced template tags Co-authored-by: Adriane Boyd <[email protected]> --------- Co-authored-by: Adriane Boyd <[email protected]>
* Add cli for finding locations of registered func * fixes: naming and typing * isort * update naming * remove to find-function * remove file:// bit * use registry name if given and exit gracefully if a registry was not found * clean up failure msg * specify registry_name options * mypy fixes * return location for internal usage * add documentation * more mypy fixes * clean up example * add section to menu * add tests --------- Co-authored-by: svlandeg <[email protected]>
* Add data structures to docs * Adjusted descriptions for more consistency * Add _optional_ flag to parameters * Add tests and adjust optional title key in doc * Add title to dep visualizations * fix typo --------- Co-authored-by: thomashacker <[email protected]>
* Update universe.json added entry for Sayswho * Update universe.json updated sayswho entry * Update universe.json * Update website/meta/universe.json * Update website/meta/universe.json --------- Co-authored-by: Adriane Boyd <[email protected]>
…osion#12173) * remove migration support form * initial test commit * add fixture * add combo test * pull out parameter example data * fix formatting on examples * remove unused import * remove unncessary fmt:off instructions * only set logger level if verbose flag is explicitly set --------- Co-authored-by: svlandeg <[email protected]>
There was a mistake in the regex pattern which caused not matching all the desired tokens. The problem was that when we use r string literal prefix to suppose a raw text, we should not use two backslashes to demonstrate a backslash.
* Fix displacy br tag * Prefer <br>, also update package CLI
* Add `cuda12x` for `cupy-cuda12x`. * Drop `cuda-autodetect` from quickstart, set default to `cuda11x` instead.
Add: example sentences to improve the Turkish model. Let's get the tr_web_core_sm out in the the world yaa
* Update universe.json added hobbit-spacy to the universe json * Update universe.json removed displacy from hobbit-spacy and added a default text.
SpaCy's HashEmbedCNN layer performs convolutions over tokens to produce contextualized embeddings using a `MaxoutWindowEncoder` layer. These convolutions are implemented using Thinc's `expand_window` layer, which concatenates `window_size` neighboring sequence items on either side of the sequence item being processed. This is repeated across `depth` convolutional layers. For example, consider the sequence "ABCDE" and a `MaxoutWindowEncoder` layer with a context window of 1 and a depth of 2. We'll focus on the token "C". We can visually represent the contextual embedding produced for "C" as: ```mermaid flowchart LR A0(A<sub>0</sub>) B0(B<sub>0</sub>) C0(C<sub>0</sub>) D0(D<sub>0</sub>) E0(E<sub>0</sub>) B1(B<sub>1</sub>) C1(C<sub>1</sub>) D1(D<sub>1</sub>) C2(C<sub>2</sub>) A0 --> B1 B0 --> B1 C0 --> B1 B0 --> C1 C0 --> C1 D0 --> C1 C0 --> D1 D0 --> D1 E0 --> D1 B1 --> C2 C1 --> C2 D1 --> C2 ``` Described in words, this graph shows that before the first layer of the convolution, the "receptive field" centered at each token consists only of that same token. That is to say, that we have a receptive field of 1. The first layer of the convolution adds one neighboring token on either side to the receptive field. Since this is done on both sides, the receptive field increases by 2, giving the first layer a receptive field of 3. The second layer of the convolutions adds an _additional_ neighboring token on either side to the receptive field, giving a final receptive field of 5. However, this doesn't match the formula currently given in the docs, which read: > The receptive field of the CNN will be > `depth * (window_size * 2 + 1)`, so a 4-layer network with a window > size of `2` will be sensitive to 20 words at a time. Substituting in our depth of 2 and window size of 1, this formula gives us a receptive field of: ``` depth * (window_size * 2 + 1) = 2 * (1 * 2 + 1) = 2 * (2 + 1) = 2 * 3 = 6 ``` This not only doesn't match our computations from above, it's also an even number! This is suspicious, since the receptive field is supposed to be centered on a token, and not between tokens. Generally, this formula results in an even number for any even value of `depth`. The error in this formula is that the adjustment for the center token is multiplied by the depth, when it should occur only once. The corrected formula, `depth * window_size * 2 + 1`, gives the correct value for our small example from above: ``` depth * window_size * 2 + 1 = 2 * 1 * 2 + 1 = 4 + 1 = 5 ``` These changes update the docs to correct the receptive field formula and the example receptive field size.
* fix typo in link * fix REL.v1 parameter
* fix usage example * revert back to v2 to allow hot fix on main
* Update incorrect example config. (explosion#12893) * spacy-llm docs cleanup (explosion#12945) * Shorten NER section * fix template references * simplify sections * set temperature to 0.0 in examples * condense model information * fix parameters for REST models * set temperature to 0.0 * spelling fix * trigger preview * fix quotes * add small note on noop.v1 * move up example noop config * set appropriate model example configs * explain config * fix Co-authored-by: Raphael Mitsch <[email protected]> --------- Co-authored-by: Raphael Mitsch <[email protected]> * Docs for ner.v3 and spancat.v3 spacy-llm tasks (explosion#12949) * formatting * update usage table with NER.v3 * fix typo in links * v3 overview of parameters * add spancat.v3 * add further v3 explanations * remove TODO comment * few more small fixes * Add doc section on LLM + task factories (explosion#12905) * Add section on LLM + task factories. * Apply suggestions from code review --------- Co-authored-by: Sofie Van Landeghem <[email protected]> * add default config to openai models (explosion#12961) * Docs for spacy-llm 0.5.0 (explosion#12967) * simplify Python example * simplify Python example * Refer only to latest OpenAI model versions from usage doc * Typo fix Co-authored-by: Raphael Mitsch <[email protected]> * clarify accuracy claim --------- Co-authored-by: Raphael Mitsch <[email protected]> --------- Co-authored-by: Raphael Mitsch <[email protected]>
* fix construction example * shorten task-specific factory list * small edits to HF models * small edit to API models * typo * fix space Co-authored-by: Raphael Mitsch <[email protected]> --------- Co-authored-by: Raphael Mitsch <[email protected]>
* fix BertWordPieceTokenizer constructor call * fix * Update website/docs/usage/linguistic-features.mdx --------- Co-authored-by: Adriane Boyd <[email protected]>
…lop-from-master-v3.7-1
* add span key option for CLI evaluation * Rephrase CLI help to refer to Doc.spans instead of spancat * Rephrase docs to refer to Doc.spans instead of spancat --------- Co-authored-by: Adriane Boyd <[email protected]>
…lop-from-master-v3.7-1
Let me redo this in a bit... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Update
develop
frommaster
for v3.7.Types of change
Chore.
Checklist