Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MDN data to power auto completion in editors through Language Server Protocol #199

Open
octref opened this issue Apr 9, 2018 · 15 comments
Labels
enhancement Improves an existing feature. idle Issues and pull requests with no activity for three months.

Comments

@octref
Copy link
Contributor

octref commented Apr 9, 2018

Hello, this is Pine from VS Code team. We are looking into using MDN data to provide an always up-to-date auto completion in VS Code, for example:

idea

The auto completion is generated by our css language server and css language service that uses Language Server Protocol.

Here is a simple explanation: Language Server Protocol is a language & editor independent protocol for enhancing code editing experience. For example rust has https://github.com/rust-lang-nursery/rls which complies with this protocol and powers editor extensions that offers auto completion in VS Code, Visual Studio, Atom, [Neo]Vim, Emacs and any LSP-compliant editors.

We are starting from css but could look into html too. I have one question though:

Where is the data for item description (such as css properties)? mdn/data and mdn/browser-compat-data don't seem to contain them. From my understanding the "short description" seem to be a part of the wiki so maybe it's hard to put them into a JSON format, but I think having this data presented to the programmers right at their editor would offer them a great experience.

Here is issue tracking this in our side: microsoft/vscode-css-languageservice#68

@wbamberg
Copy link
Contributor

wbamberg commented Apr 9, 2018

That's very interesting. Yes, things like the summary and description of the syntax just lives in the Wiki at the moment, so it is hard to access it reliably.

It could be accessed fairly reliably, the CSS pages are quite consistently structured. We have done this in the past in the Firefox devtools: https://www.youtube.com/watch?v=ptVtAEOK7y4 and it worked OK (we removed the feature because noone ever found it).

But it would be much better if the data were more explicitly structured, so you could rely on it more heavily. We've wanted to do this for a long time, but progress is slow. I think CSS summary and syntax description would be good places to start, because they are quite well-defined.

@frenic
Copy link
Contributor

frenic commented Apr 9, 2018

I would be interested in the summary and description as well for csstype.

@a2sheppy
Copy link
Contributor

a2sheppy commented Apr 9, 2018

We actually have a feature requirement for SEO over the mid-to-long term that requires the ability to set a summary independently from the text on the page (rather than the current technique of having a block of text in a span with the class "seoSummary"). That's because the requirements for a summary for SEO purposes are often in conflict with what we want to be saying on the page itself, and currently we find ourselves having to compromise, sometimes awkwardly, sometimes totally at the expense of good SEO practices.

So a feature request like this actually neatly backs up the existing need to have the same capability for SEO purposes.

@octref
Copy link
Contributor Author

octref commented Apr 9, 2018

@wbamberg Your video is very cool, and I think a better place to display that data is right in the editor

hover

(Imagine the content of the hover being the MDN doc)

The implementation is not very hard either, basically whenever the user hovers on a CSS word in VS Code, a request is sent to the CSS Language Server, which can respond with documentation (plain text or markdown) that is shown in editor.

Spec: https://microsoft.github.io/language-server-protocol/specification#textDocument_hover

@a2sheppy Glad to hear this feature will benefit MDN too 😃

@a2sheppy
Copy link
Contributor

a2sheppy commented Apr 9, 2018

@octref The big question will be if the text requirements for the summaries needed for SEO and the text desired for stuff like what you're looking to build are the same, and I'm not sure they always will be. The summaries for SEO purposes have a major constraint on length (ideally between 150-160 characters). That's actually a great length for tooltips but does sometimes require interesting wording. :)

@octref
Copy link
Contributor Author

octref commented Apr 9, 2018

@a2sheppy Yep SEO description doesn't always align with my use case. For example, for the property align-self I would also need descriptions for all the values.

The bigger picture from our side is to present the always-up-to-date information on MDN to users (as the web API is evolving fast), so we don't have to hand maintain a schema and users get the high-quality documentation on MDN right in their editors.

@octref
Copy link
Contributor Author

octref commented Apr 11, 2018

@wbamberg @a2sheppy

Here is an observation of the data from our side: microsoft/vscode-css-languageservice#68 (comment)
And a rough plan: microsoft/vscode-css-languageservice#68 (comment)

I'm wondering if you have a rough timeline estimate for open-sourcing the data, or should we go down the way of writing a crawler of MDN pages.

@jwhitlock
Copy link
Contributor

@octref this project isn't on the timeline, as far as I'm aware. The browser-compat-data project will continue to be the focus until the data is migrated, hopefully by July. If you need to start soon, scraping may be the best option. It may be useful to publish your data schema, as a prototype for a future "official" effort.

The MDN Product Advisory Board would probably be the right forum to present an opportunity and get it on the list. There are a few Microsoft representatives in the board members.

@octref
Copy link
Contributor Author

octref commented Apr 13, 2018

@jwhitlock
Sure, I'm emailing Erika and Patrick. Meanwhile, is there any license protecting the data on MDN? Is it legal to scrape data from MDN and use it in VS Code?

@wbamberg
Copy link
Contributor

@octref : https://developer.mozilla.org/en-US/docs/MDN/About#Using_MDN_Web_Docs_content

I agree that scraping is your best bet for now, but this is still a thing we want to do, and understanding how you use the data would be a big help to us in designing it.

@wbamberg
Copy link
Contributor

@octref , we've been discussing this project to add CSS short descriptions to mdn/data. Briefly, we think it won't work, because mdn/data is CC0 licensed and Wiki content is CC-BY-SA licensed.

So we are currently working on an alternative plan, which is to keep the short descriptions in a separate mdn repo, mdn/short-descriptions. We will then publish a new npm package, probably called mdn-docs, that will be CC-BY-SA licensed and will include both CSS short descriptions and mdn/data.

I've drafted a doc describing this approach here: https://github.com/mdn/short-descriptions/blob/e9c275a1da7dbd40d502d70afbbd591f0f1ea81a/Project-overview.md (currently in this PR: mdn/short-descriptions#2) and would be happy to hear your feedback.

@octref
Copy link
Contributor Author

octref commented Oct 21, 2018

@wbamberg License wise, this is totally OK for us (VS Code). I also want to attribute to MDN whenever I could.

As for your open questions, I think:

  • Publishing them as separate packages make sense, as long as it's straightforward to consume them together with each repo keyed by element/property names.
  • Therefore, I think mdn/short-descriptions for now is good. But going forward, if we decide to extract DOM API's function signatures & descriptions for Enhance DOM API completion by sourcing descriptions from MDN microsoft/TypeScript#26404, where do we put all the data?
  • A good model going forward might be to extract the list of all HTML/CSS/JS elements from BCD and compile a mdn/index or something. So each package can have mdn/index as a dependency and provide a specific subset of MDN's data (compat, short description, function signature, code sample, etc).

Thanks again for all your hard work!

@wbamberg
Copy link
Contributor

Publishing them as separate packages make sense, as long as it's straightforward to consume them together with each repo keyed by element/property names.

Yes, I think part of the difficulty with making them separate packages is cross-referencing. At the moment mdn-browser-compat-data has one way of referencing items, and mdn-data has a different way.

If mdn-short-descriptions goes the BCD way (which I think it should), and you need data from mdn-short-descriptions and mdn-data, then seems to me that this makes things complicated for consumers (e.g. you).

I think we should consider restructuring mdn-data to use the BCD style, but that would be a big break of course.

Therefore, I think mdn/short-descriptions for now is good. But going forward, if we decide to extract DOM API's function signatures & descriptions for microsoft/TypeScript#26404, where do we put all the data?

In the design conversation I'm trying to keep separate two realms: the realm of the GitHub repos where we contribute and review work, and the realm of the npm packages which we publish to consumers. The consumers should not care about the GH repos we have or how they are organized.

So in the "current proposal" diagram we're building mdn/data and mdn/short-descriptions into a single package mdn-docs. If we later want to add extra stuff (like function signatures), we'd add it in there too.

So for reasons like this I favour a single package for data and short-descriptions.

I'm not certain that this is a sensible approach and would welcome feedback on it. It's worth emphasizing that this is all quite experimental for us.

@octref
Copy link
Contributor Author

octref commented Oct 31, 2018

Just found this issue for HTML tag in VS Code repo. Currently it's the most voted feature request for HTML. microsoft/vscode#25898

Sharing here since we (me, @wbamberg, @atopal) talked a bit about providing translations for HTML / CSS / JS descriptions. If VS Code / Microsoft do decide to work on translating the documentation, I can suggest to make the translation go directly back to MDN and we source it from there. This would allow others to consume the translations, too.

@larsonreever
Copy link

a2sheppy

yeah i agree with you, since iam myself an seo since last 15 years . This problem is common as people are unable to see things from seo perspective and at times it makes integrating best seo practices within the code and getting devs on same page as yours a bit awkward. As i happen to work on my own local site here base in IIS it has lot of limitations when it comes to following best practices.

@github-actions github-actions bot added the idle Issues and pull requests with no activity for three months. label Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves an existing feature. idle Issues and pull requests with no activity for three months.
Projects
None yet
Development

No branches or pull requests

6 participants