Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add "forbid-implicit-declarations" lint rule #244

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ghost
Copy link

@ghost ghost commented Mar 27, 2020

Fixes #217

@googlebot googlebot added the cla: yes All contributors in pull request have signed the CLA with Google. label Mar 27, 2020
Copy link
Collaborator

@fangism fangism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we build up a full suite of test cases first before diving into implementation.
Good start so far.


TEST(ForbidImplicitDeclarationsRule, FunctionFailures) {
auto kToken = SymbolIdentifier;
const std::initializer_list<LintTestCase> ForbidImplicitDeclarationsTestCases = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially focusing on test cases first:

  • [question for Verilog experts] can implicit nets also be created by merely referencing them, say on the RHS of an assign, or inside an expression? assign foo = ~can_this_be_an_implicit_net;. Years ago, I found such a case while debugging someone else's netlist, and root causing a chip failure as being attributed to something like a typo.

Suggest test cases:

  • where LHS includes identifiers inside concatenation grouping like {x, y, z} = ...
  • with multiple net assignments, e.g. assign a = b, c = d.
  • net implicit-decls/references inside generate blocks (including conditional, and loop)
  • LHS with array indices like a[j]
  • where net is explicitly declared after its first use (out-of-order)

The answers to the above may influence choice of implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, an implicit net cannot be created by referencing them on the RHS. Of course, this is according to the LRM, vendor's implementation can differ.

Using implicit declaration with array (assign a[0] = 1'b1) is also illegal. I believe the array indices must be resolved before the assignment, therefore if a does not exist, this would fail.

Another missing test case is implicit net declared in a terminal/port connection list:

module m;
    wire a;
    not U1(implicit_net_0, a);
    my_m U2(implicit_net_1);
endmodule

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks for the clarification, and thanks for suggesting more test cases!
Your test case is exactly the situation I was thinking of (not usage in an expression but rather in a port connection). To that, I would add:

module m;
  my_m U3(.foo(implicit_net_2));
  my_m U4(.nonexistent_net_in_this_scope);   // assuming .x is equiv. to .x(x)
endmodule

@ghost
Copy link
Author

ghost commented Apr 8, 2020

Thanks @corco for suggesting more test cases!

@fangism, @corco What do you both think about such case (current implementation reports it as violation):

module m;
  wire a1;
  begin
    assign a1 = 1'b1;
  end
endmodule

@fangism
Copy link
Collaborator

fangism commented Apr 8, 2020

Thanks @corco for suggesting more test cases!

@fangism, @corco What do you both think about such case (current implementation reports it as violation):

module m;
  wire a1;
  begin
    assign a1 = 1'b1;
  end
endmodule

This test case, I would expect to be clean, because a1 is in scope.

The authoritative text I'm reading and interpreting is LRM 6.10, this passage:

If an identifier appears on the left-hand side of a continuous assignment statement, and that identifier has not been declared previously in the scope where the continuous assignment statement appears or in any scope whose declarations can be directly referenced from the scope where the continuous assignment statement appears (see 23.9), then an implicit scalar net of default net type shall be assumed. See 10.3 for a discussion of continuous assignment statements.

@ghost
Copy link
Author

ghost commented Apr 8, 2020

You're right. It's not an implicit net declaration and it shouldn't be reported as violation.
I thought it would be nice to report such assignment because it could be errror-prone. But I forgot that this example would also fit and it's not bad at all:

module m;

wire a1;

always_comb begin
  a1 = 1'b0;
end

endmodule

I'll write tests for that and fix code. Thanks for reply.

" endgenerate\n"
" assign a1 = 1'b0;\n"
"endmodule"},

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More test case ideas:

  • assignments/connections inside loop and conditional generate constructs

// TODO: module scope

// TODO: nets declared inside terminal/port connection list
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more test case ideas based on this passage in LRM 6.10:

The implicit net declaration shall belong to the scope in which the net reference appears. For example, if the implicit net is declared by a reference in a generate block, then the net is implicitly declared only in that generate block. Subsequent references to the net from outside the generate block or in another generate block within the same module either would be illegal or would create another implicit declaration of a different net (depending on whether the reference meets the preceding criteria). See Clause 27 for information about generate blocks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lint rule is very close to detecting that all variables are declared in the scope they are used. One about implicit net declaration which yields valid, yet undesirable, Verilog, and the other one is about "no such variable", which is always invalid Verilog.

I would add a test case to verify the scope is proper

module m;
    wire a1;
    generate
        wire a2;
        assign a1 = 1'b1; // OK, a1 is in scope
    endgenerate
   assign a2 =1'b1; // Implicit net declaration
endmodule

Note that my simulator rejected @ldalek-antmicro's proposed test case because begin/end block is not allowed in a module. I'm trying to find the proper LRM section, but I believe block statement is allowed in processes only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corco You are correct that bare begin/end blocks were removed from SV grammar (but were in Verilog-2001). We even have a lint rule for it: https://github.com/google/verible/blob/master/verilog/analysis/checkers/v2001_generate_begin_rule.h

I found that bare begin/end generate blocks were so common in practice, that I supported it in the yacc grammar of verilog.y (and more easily accept legacy verilog), only to provide that lint rule, intended for SV code. (I don't have dialect selection of rules implemented yet, though.)

Copy link
Author

@ghost ghost Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I added test case which @corco suggested. Also I replaced begin-end blocks with submodules. I got used to them too much and forgot to replace.

@@ -17,7 +17,6 @@

#include <set>
#include <string>
#include <stack>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <vector>

Comment on lines 77 to 88
bool ContainsAncestor(const verible::Symbol* const symbol,
const verible::SyntaxTreeContext& context) const {
CHECK_NOTNULL(symbol);

// check for common ancestors
for (const auto& iter : context) {
if (iter == symbol) {
return true;
}
}
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suitable for SyntaxTreeContext, can we move it there?
I would also define it using std::find or absl::c_find.

https://github.com/abseil/abseil-cpp/blob/1112609635037a32435de7aa70a9188dcb591458/absl/algorithm/container.h#L215

Comment on lines 66 to 67
while ((scopes_.size() > 0) &&
(!ContainsAncestor(scopes_.back().symbol_, context))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quadratic: a linear search inside a while loop.
Is there a reason why a linear search is needed?
This seems like a job for automatic balancing, like:

https://github.com/google/verible/blob/241da929600ea0e458580df6f8bfe95b288cff08/common/text/syntax_tree_context.h#L37

namespace analysis {

// ForbidImplicitDeclarationsRule detect implicitly declared nets
class ForbidImplicitDeclarationsRule : public verible::SyntaxTreeLintRule {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is my first serious look at the implementation of this analysis, after we've amassed a healthy collection of test cases.
I'll start with high-level design thoughts. @hzeller This might be of interest to you, since you're looking at lint rule design issues these days.

I see that you've chosen to base this on SyntaxTreeLintRule which seems intuitive, and a combination of matchers.

  • Keep in mind that matchers, though powerful and expressive, run on every node that is visited (HandleNode), so every node is starting a new pattern search.
  • The limitation of a using SyntaxTreeLintRule is that it lacks control over the visitation of nodes, which is being done in SyntaxTreeLinter::Visit. Why was this done? SyntaxTreeLinter collects a bunch of rules, and executes them all in a single CST traversal that builds up and tears down SyntaxTreeContext once. The implication is that each instantiated rule object must maintain enough state to have its actions interleaved among other "concurrently" executing rules. Had we let each rule manage its own traversal, they could simply look like functions of CST structures that return collections of lint violations. However, we are not forced to share traversals with a SyntaxTreeLintRule, we have the option of owning and customizing traversal by using the more general TextStructureLintRule.

Looking over the current implementation, I see effort in maintaining a stack that resembles the SyntaxTreeContext maintained by SyntaxTreeLinter and its base TreeContextVisitor. (Note that verible::TreeUnwrapper is another subclass of TreeContextVisitor.) Only instead of growing the stack on every node, you only grow the stack on nodes bearing lexical scope. If you extended TreeContextVisitor to include a scope-stack (Verilog-specific, no longer language-agnostic) in addition to the normal SyntaxTreeContext stack, that could be the basis for all scoped analyses that use symbol tables (like your DeclType).

What about matching? In past PRs I've often offered the alternative of using direct substructure access through CST/ functions, which is much faster, because it is not based on any tree-searching/predicate-matching. The alternative to grabbing .Bind() symbols from BoundSymbolManager is direct access.

  • From every net declaration (or element declaration) you can directly reach the identifier(s) it declares precisely.
  • From every assignment you can isolate the LHS from the RHS:
    • You could set state variables to distinguish when you're in the left vs. right instead of trying to infer from upward inspection.
    • Based on left vs. right, you would know whether you are looking at lvalues vs. rvalues (but LHS references can have rvalues in array indices, so be cautious there.)
  • Most of the logic would be in the form of a sparse switch-case statement on node types (much like the formatter's TreeUnwrapper::SetIndentationsAndCreatePartitions) whose interesting cases include declarations of several types, assignments, instantiations, loop/conditional generate blocks, references, and non-interesting cases would just be default traversal.

Does this sound simpler than your current implementation?

@ghost
Copy link
Author

ghost commented Apr 24, 2020

Okay. I've modified my implementation.
First of all I've concentrated on Visitor pattern and extended TreeContexVisitor.

I'm using ScopeTreeVisitor to build scope stack (with declared & referenced nets) and call
handlers on new scope/new declaration/new reference.

At this moment I'm curious what do you think about that approach (apart from missing tests for auto_pop_stack.h and score_tree_visitor.h)?

Copy link
Collaborator

@fangism fangism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress on this.
Suggest continuing to expand test cases (noted as TODOs), and hoping that might lead to implementation simplification.

Comment on lines 58 to 59
CHECK_NOTNULL(syntax_tree);
syntax_tree->Accept(this);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also write: ABSL_DIE_IF_NULL(syntax_tree)->Accept(this);

Comment on lines 69 to 70
const auto& leaf = *ABSL_DIE_IF_NULL(
dynamic_cast<const verible::SyntaxTreeLeaf*>(&symbol));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SymbolCastToLeaf from common/text/tree_utils.h (for consistency)

const auto identifier = leaf.get().text;

// Search upward stack and for declared nets.
for (auto itr = scope.rbegin() ; itr != scope.rend() ; ++itr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Search upward stack and for declared nets.
for (auto itr = scope.rbegin() ; itr != scope.rend() ; ++itr) {
const auto& declared_nets = (*itr)->declared_nets_;
const auto& net = declared_nets.find(identifier);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto& net_iter = ...


std::set<verible::LintViolation> violations_;

ScopeTreeVisitor visitor_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScopeTreeVisitor is used as both a base class (inheritance) and a member (containership). Design-wise, we probably want to use only one of these.

const auto tag = verilog_tokentype(leaf.Tag().tag);

switch (tag) {
case verilog_tokentype::SymbolIdentifier: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to register declarations coming from module ports? Also keep in mind there are two style of ports: ANSI, and non-ANSI style.

Comment on lines 68 to 90
} else if (context.DirectParentsAre({NodeEnum::kUnqualifiedId,
NodeEnum::kLocalRoot,
NodeEnum::kReference,
NodeEnum::kReferenceCallBase,

NodeEnum::kLPValue,
NodeEnum::kNetVariableAssignment,
NodeEnum::kAssignmentList,
NodeEnum::kContinuousAssignmentStatement}) ||

context.DirectParentsAre({NodeEnum::kUnqualifiedId,
NodeEnum::kLocalRoot,
NodeEnum::kReference,
NodeEnum::kReferenceCallBase,

NodeEnum::kExpression,
NodeEnum::kOpenRangeList,
NodeEnum::kBraceGroup,
NodeEnum::kLPValue,

NodeEnum::kNetVariableAssignment,
NodeEnum::kAssignmentList,
NodeEnum::kContinuousAssignmentStatement})) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above: can you track some state as you traverse the tree without having to query the context?

Suggest some comments here declaring intent (examples might help). It looks like (?) you are looking for lvalues, based on kLPValue.

What about rvalues? From the other thread about test cases:

"The implicit net declaration shall belong to the scope in which the net reference appears."

I think net references covers both lvalues and rvalues. If that is the case, your logic for entering this condition may actually be simpler.

namespace analysis {

// This tree visitor traverse CST tree and mantains scope stack
class ScopeTreeVisitor : public verible::TreeContextVisitor {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks sufficiently specialized that we might just call this class something like NetSymbolResolver.

Comment on lines 235 to 236
// TODO: nets declared inside terminal/port connection list
// TODO: assignments/connections inside loop and conditional generate constructs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we keep expanding on some of these test cases sooner, as handling these may actually simplify some of the tracking logic.

return *ABSL_DIE_IF_NULL(stack_.back());
}

typedef std::vector<value_type*> stack_type;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typedef std::vector<value_type> stack_type;

By removing the pointer from this implementation, you make this a suitable base class for SyntaxTreeContext.

class SyntaxTreeContext : public AutoPopStack<const SyntaxTreeNode*> {
  ...
 public:
  using AutoPop = base_type::AutoPop;
  ...
};

The only adjustment you might have to make is that in the SyntaxTreeContext subclass top() (or renamed) should return a const T& (or const SyntaxTreeNode*&) to be agnostic to whether T is a pointer type or object type.

From a mutability perspective, SyntaxTreeContext was designed to forbid any mutation to the stack elements, even the top element -- everything is const, and AutoPop is the only interface for mutation. But I'm aware you intend to use it in this PR with a mutable top/current scope, because you have to register declared symbols. So as far as class design is concerned, the common base class may support mutable elements, but the subclasses may choose to further restrict the interfaces for what is mutable (where private inheritance can help).

Let's say your scope-stack subclass wants to ensure that only the top-most scope can be modified, but all others can be looked-up with const accesses. You could lock down most of your interface by exposing only const iterators/methods (hiding the mutable ones), except for T& top() as the only mutable entry point. The base class could provide both const and non-const top(), and the subclass could present only the const one if it chose to.

Keep this in mind should you continue to refactor this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did write this class in mind that it will be used in SyntaxTreeContext. Thanks for noticing that. I think that it will be easier to work on this class in separate PR #289

Copy link
Collaborator

@corco corco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the test cases, the only problem I see is that module in module do not have visibility on their parent scope,

{"module m;\n"
" wire a1;\n"
" module foo;\n"
" assign a1 = 1'b0;\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable a1 is not visible in module foo, linter should fail here

Copy link
Author

@ghost ghost Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to LRM 23.4 Nested modules:

A module can be declared within another module. The outer name space is visible
to the inner module so that any name declared there can be used

example (modelsim):

  module m;
    wire a1;
    assign a1 = 1'b0;

    module foo;
      assign a1 = 1'b1;
    endmodule;

    initial $monitor("a1 = %b", a1);
  endmodule

output is an invalid 'X' value:

# a1 = x

So a1 is visible in foo module and it's not an implicit net.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, you are right, sorry. SystemVerilog is so ugly sometimes. I was used to nested modules with ports, which needs to be instantiated manually and not implicitly like your example.

I didn't know they had access to the parent namespace.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both of you for your scrutiny over the test cases.

I also just learned this about implicit instantiation (LRM 23.4):

"Nested modules with no ports that are not explicitly instantiated shall be implicitly instantiated once with an instance name identical to the module name. Otherwise, if they have ports and are not explicitly instantiated, they are ignored."

" assign a1 = 1'b0;\n"
" endmodule\n"
" assign a1 = 1'b1;\n"
"endmodule"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last one is the same as the one I commented earlier. One of the two should be removed.

" assign ", {kToken, "x3"}, " = 1'b0;\n"
" assign ", {kToken, "x2"}, " = 1'b0;\n"
" assign x1 = 1'b1;\n"
"endmodule"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, the module declared in module should not have access to the parent variables

@corco
Copy link
Collaborator

corco commented Apr 29, 2020

A failing case of the current implementation is hierarchical references:

module m;
  foo u1();
  assign u1.a1 = 1'b0; // This is not implicit net declaration
  assign $root.a1 = 1'b0; // same
  assign $unit.a1 = 1'b0; // same
endmodule;

A test case that I believe should be added (and passes) is using assign/force in procedural assignment:

module m;
  initial
    assign a1 = 1'b0; // Not implicit net declaration, this is "not such variable a1"
endmodule;

@hzeller
Copy link
Collaborator

hzeller commented Jan 29, 2021

What happened to this pull request ?

@fangism
Copy link
Collaborator

fangism commented Jan 29, 2021

I would actually recommend re-implementing this entirely using the current symbol table's data, which does most of the heavy lifting of symbol resolution, but doesn't check for the ordering of definition/use (which could be done by comparing string_view positions).

@ghost
Copy link
Author

ghost commented Feb 2, 2021

I agree with @fangism. I'll keep the testsuite and re-implement it using current scope/symbol resolution code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes All contributors in pull request have signed the CLA with Google.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Forbid implicit declarations
4 participants