Skip to content

NEW (RegexEngine) @W-15985046@ Implement regex engine with trailing whitespace rule #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 2, 2024

Conversation

ravipanguluri
Copy link
Contributor

In this PR:

  • Implement rule that checks for trailing whitespaces at the end of lines within an Apex class
  • runRules() method will run the trailing whitespace rule when pointed to a file and/or directory
  • The engine will emit appropriate error/logging messages when applicable

@ravipanguluri ravipanguluri marked this pull request as draft June 27, 2024 17:30
for (const file of allFiles) {
const fileData = fs.statSync(file)
if (fileData.isFile()) {
codeLocations = codeLocations.concat(await this.scanFile(file));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting use of codeLocations. We originally planned to have codeLocations be plural for path based violations... where 1 violation is associated with a path through the code. But I haven't considered having 1 violation (in this case for the 1 rule) be associated with multiple code locations.

Instead I would have thought we would want 1 violation per code location.

The question here becomes how VSCode in the future will use this. Note that the primaryLocationIndex was intended to be the single place that vscode could mark the violation for the user to see. This is another reason why I think we want 1 violation per code location in this case... so that vscode can show each of these locations separately as individual violations.

Thoughts? @jag-j

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and I just noticed... you aren't even creating one violation per file... you literally are creating 1 violation for all files in all locations. Interesting. This is a good use case of what external engine-api users might think.

But yeah, the CLI, and things like our HTML output, etc... might trim these locations thinking that multiple locations are for non-standard rule types. This makes me think we I should add a validation step in Core that ensures that if an engine returns a violation for a Standard rule type that has multiple code locations - then it will error to inform users to complain to the engine author that they did something wrong.

So for now, please change so that every regex match ends up with 1 code location per violation since this regex engine is returning standard rule types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean I should increment (or somehow update) the primaryLocationIndex for each violation then? Or just leave them all to be the same? And yes, my interpretation was based on the fact that code locations was a list, so I thought that meant for each rule you collect all the code locations where there was a violation. I am changing it to have a single code location for every violation now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the primaryLocationIndex will be 0 (for 0 based indexing) when you have 1 code location for each violation.

For path based rules, the violation will be assocaited with a path and the primaryLocationIndex could be the first, the last, or somewhere in the middle to help clients like vscode know where to report the violation associated with the path. We'll see how that turns out when we eventually migrate over the salesforce graph engine.

Comment on lines 25 to 26
message: "",
resourceUrls: [""]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resourceUrls are option here. On the rule description we encourage them... but here they are optional (to be appended to the final list of urls from the rule description that the core framework gives back to the client for the violation). So remove resourceUrls all together (delete line 26).

But we definitely want to fill in some sort of message here (even if it ends up being the same message as the rule itself).

Comment on lines 35 to 41
const fileType: string = path.extname(fileName)
let codeLocations: CodeLocation[] = [];
if (fileType === APEX_CLASS_FILE_EXT){
codeLocations = await this.getViolationCodeLocations(fileName)
}

return codeLocations;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be:
return path.extname(fileName) === APEX_CLASS_FILE_EXT ? await this.getViolationCodeLocations(fileName) : [];


private async getViolationCodeLocations(fileName: string): Promise<CodeLocation[]> {
const codeLocations: CodeLocation[] = [];
const data: string = fs.readFileSync(fileName, {encoding: 'utf8', flag: 'r'})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'r' the default of flag? If so, I don't think you need to specify it.
Also, instead of data, maybe say fileContents.


split_data.forEach((line: string, lineNum: number) => {
let codeLocation: CodeLocation;
regex = /(?<=[^ \t\r\n\f])[ \t]+$/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are hard coding this, can't you move it to line 48 so you can make regex a constant?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also given your line by line approach... do we care about (?<=[^ \t\r\n\f]) ?

That is... if a line is just a line of spaces... shouldn't it also be flagged as a violation?

Wouldn't /\s+$/ be sufficient for line by line approach?

Copy link
Contributor Author

@ravipanguluri ravipanguluri Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that whitespace lines could be allowed in between lines of code for stylistic purposes. But I can change it to the simpler regex expression if that wasn't the intent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... that's a good question... if a line contains nothing but whitespace characters (excluding the single newline case) - is that considered trailing whitespace or not. Might want to ask the others what they think. It really comes down to the nature of the rule - to help folks save on the amount of characters they have. So I would think we would want to detect this.

private async getViolationCodeLocations(fileName: string): Promise<CodeLocation[]> {
const codeLocations: CodeLocation[] = [];
const data: string = fs.readFileSync(fileName, {encoding: 'utf8', flag: 'r'})
const split_data: string[] = data.split("\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we want to split on? I would think windows would have \r characters left behind and thus triggering your regex to flag every single line.

let regex: RegExp;
let match: RegExpExecArray | null;

split_data.forEach((line: string, lineNum: number) => {
Copy link
Collaborator

@stephen-carter-at-sf stephen-carter-at-sf Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of looping line by line, did you consider applying a regular expression to the entire file?

Basically, thinking ahead with how we would add in more regular expression rules to this engine... we don't want the restriction to be that the regular expressions that we make are subject to just a single line do we?

So maybe we just search for something like /\s+$/m on the entire file... and then only if violations are found do we postprocess to find the line numbers (by using the position of the match and counting the number of newlines between the start and that match, etc). Just a suggestion that will then open up the possibilities for a generalized regular expression rule system.

Copy link
Collaborator

@stephen-carter-at-sf stephen-carter-at-sf Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also... I wonder about the case where we could have a ton of newline characters at the end of a file.. .is that considered trailing whitespace?

maybe something like /[ \t]+\r?\n|\r?\n{2,}$/g; instead? Can this be used to get all the locations of the trailing whitespaces?

Copy link
Contributor Author

@ravipanguluri ravipanguluri Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked through the approach that I took to solving the problem with @jag-j and one thing i'm worried about is that it could be somewhat complicated for users to make rules if we ingest all the regex expressions the same way. I also proposed to Jag that a delimiter be something that the user specifies in the config file as we expand. Because there are also other cases like finding a certain regex pattern across a block of code that may require a different splitting procedure. However, I also do see the merit of having all the rules be composed the same way, so I am completely open to changing my implementation. Further, I also see that splitting on the "/n" is problematic.

@ravipanguluri ravipanguluri marked this pull request as ready for review July 1, 2024 22:56
private async scanFile(fileName: string): Promise<Violation[]> {
const violations: Violation[] = [];
if (path.extname((fileName)) === APEX_CLASS_FILE_EXT) {
const fileContents: string = fs.readFileSync(fileName, {encoding: 'utf8'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravipanguluri Not sure if you already discussed this with @stephen-carter-at-sf. Can this be async readFile instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see ravi already submitted. But yeah, this probably should have been an async read for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you had said we were going to leave it as sync for now which is what I told to @jag-j during our 1-1, but I realize that was a fair bit prior to this point. I can make a new short work item for it and change it to async if you want. It shouldn't take too long. My bad on this.

private updateNewlineIndices(fileContents: string): void {
const newlineRegex: RegExp = new RegExp(this.lineSep, "g")
const matches = fileContents.matchAll(newlineRegex);
this.newlineIndexes = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - This is fine, but I would prefer this method returning newlineIndexes rather than having it as a class member. Making testing a lot easier and you can also test each of these individual methods (updateNewlineIndices, getColumnNumber, getLineNumber) easily.

Copy link
Contributor

@jag-j jag-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, push your pending changes and merge the PR.

@ravipanguluri ravipanguluri merged commit 1f269c1 into dev Jul 2, 2024
5 checks passed
@ravipanguluri ravipanguluri deleted the rp/W-15985046 branch July 2, 2024 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants