Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Redesign #31757

Draft
wants to merge 93 commits into
base: main
Choose a base branch
from
Draft

[docs] Redesign #31757

wants to merge 93 commits into from

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Jul 2, 2024

The main goal of this PR is to redesign the Transformers docs to:

  1. Be more developer-friendly.
  2. Improve navigation by replacing the existing structure with a more organic one that scales naturally instead of forcing content into the 4 current predefined sections.
  3. Create a more unified docs experience by integrating content rather than adding it on.

This PR proposes a potential structure for achieving 2 and 3. Once the structure is in place, each doc will be rewritten to achieve 1.

If you're interested in more details about the redesign's motivation, please read this blog post. If you want more details about 1, 2, and 3, please read this post and this one too.

All feedback, alternative structures, and comments are welcomed! Thanks 🙂

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like! 👍

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
title: Pipelines for webserver inference
- local: add_new_pipeline
title: How to add a pipeline to 🤗 Transformers?
- title: LLMs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A type of model that's becoming increasingly common are VLMs: they are the same as LLMs, but also accept image inputs.

Would it make sense to call this section "LLMs and VLMs"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! Let's rename the section when we have some VLM-specific docs?

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
docs/source/en/_toctree.yml Outdated Show resolved Hide resolved
@ydshieh
Copy link
Collaborator

ydshieh commented Jul 9, 2024

Indeed easier to read ❤️ . But there are a few places need to be moved if I understand correctly?

@stevhliu
Copy link
Member Author

I've kicked off the redesign with the "Get Started" section. Feel free to review this section while I start on the next one (Base classes)!

The main changes are:

index.md

  • A cleaner index page that better describes what Transformers is in terms of its features and design. I believe this is more impactful than listing all the tasks you can solve across modalities. The focus shouldn't be on the tasks that you can solve; it should be on the models themselves. By describing the type of models available, I think users will understand that they can use them for their tasks. Having a more holistic description of the library here is more important than focusing on the different tasks/modalities.
  • More of a question here, but would it be better to maybe have badges on each model API doc that indicate whether it supports PyTorch/TensorFlow/Flax? Instead of having/maintaining such a long list that clutters up the main landing page, I think it'd be a lot cleaner to have this information on each model page. This way, users can see everything at once on the model page.

quicktour.md

  • Removed the "vertical" PyTorch/TensorFlow blocks in favor of the "horizontal" ones which I think it cleaner and less overwhelming.
  • Removed the big table of tasks available to Pipeline with just three code examples. I think this makes it simpler and more approachable. I also removed the Pipeline video because it felt very NLP-heavy, but we can add it back if we want to keep it.
  • Updated the AutoClass section to also be simpler and faster for users to start. A lot of these details (eg, tensors are outputted before the final activation function, custom model builds) can be explained in more depth in later docs. Also took this opportunity to introduce the generate API.
  • A better Next steps section directing users to topics of interest.

installation.md

  • Removed the options for downloading files in favor of just one method to keep it simple, and link to the Download files from the Hub doc for more details.

@stevhliu
Copy link
Member Author

stevhliu commented Aug 7, 2024

Hi, I'm back with an update! I've wrapped up the technical guides in the Models section. I'll circle back to the more conceptual docs later and also create some visual diagrams in Figma. Next up, I'll start working on the Preprocessors section. 🙂

The main focus is on how to load, customize, share, and contribute a model, basically a one-stop section for all your general model docs. The Load and Contribute docs have more significant changes:

models.md

  • Repurposed to show how to load a model. I start with a quick example of AutoModelFor.from_pretrained() so you can immediately get started, and then progressively peel back the layers. From how models and configurations interact, to the AutoClass API, and then model-specific classes. To make it easier to find how to load any model, I also added big models (device_map="auto") and custom models (trust_remote_code="True")to this page.

add_new_model.md

  • Updated structure to make the steps more discoverable. Before, many of the steps were hidden in "5.-14. Port BrandNewBert to Transformers" but now what these actual steps are more clear.

@stevhliu
Copy link
Member Author

Finished the first draft of the Tokenizers doc, and I'm pretty excited that it reduces "content creep" from 6 different docs to just 1! 😎

@stevhliu
Copy link
Member Author

The first draft of the practical guides in the base classes section is finished now! Please feel free to check it out and leave any comments or feedback (not sure why the feature extractor and processor docs aren't showing in the preview at the moment) 😄

I'll start working on the inference section after I review the first draft.

@stevhliu stevhliu force-pushed the doc-redesign branch 3 times, most recently from 01434fd to ca38c6a Compare August 26, 2024 21:11
@stevhliu stevhliu force-pushed the doc-redesign branch 3 times, most recently from 0813af1 to bfc386d Compare September 9, 2024 23:23
@stevhliu stevhliu force-pushed the doc-redesign branch 3 times, most recently from fc48f55 to 75bace2 Compare September 25, 2024 23:32
@stevhliu stevhliu force-pushed the doc-redesign branch 6 times, most recently from d66930e to 9312112 Compare October 22, 2024 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants