Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ECMA-SL Operators to Use External Functions #146

Merged
merged 16 commits into from
Jul 25, 2024

Conversation

andreffnascimento
Copy link
Contributor

In this PR, we update the ECMA-SL operators, changing some of them to be implemented as external functions. To avoid clutter, we will push the changes to the interpreters after the PR is reviewed. Below, we provide a rationale for each commit:

Changes

  1. refactor(syntax): remove constants from operators
    The implementation of constants was inconsistent. Some were directly converted to float values in the parser, while others were represented internally as operators and then converted to float values in the compiler. To address this, we have removed constants from the .esl operators and now convert them to values in the .esl parser.

Note: I feel like the proper place for constants is in the stdlib of ECMA-SL.

  1. chore(syntax): change the syntax of the eq operator to ==
    Since the syntax for the not equals operator is defined as !=, we have changed the syntax for the equals operator from = to ==.

  2. clean(syntax): rename NE operator to Ne
    All operators (variants) are written with the first letter of each word capitalized and the rest lowercase. This change fixes the name of the operator to match the convention.

  3. chore(syntax): match the syntax of the ite operator in .cesl with .esl
    The syntax for the ifThenElse operator in .esl (e1 ? e2 : e3) was different from that in .cesl (ite(e1, e2, e3)). This commit updates the syntax in the .cesl parser to match the one from the .esl parser. Additionally, we rename the operator to Conditional operator to align with the standard naming convention.

I noticed that this operator does not behave like the standard conditional operators, in that both expressions e2 and e3 are always evaluated regarding the value of e1. Should we change this?

  1. chore(parser): remove locations from .cesl and .esl parsers
    Locations should not exist in the parsers, as they are internal values used to manipulate objects. Since the programmer has no control over the location assigned to an object, they should not be able to access the object directly using its location.

Unless there is another reason which I'm not aware?

  1. chore(syntax): externalize math operators
    Most mathematical operators were already externalized except for the abs, sqrt, floor, ceil, and trunc. With this change, they are now implemented as external functions.

  2. chore(syntax): externalize string operators
    The functions to manipulate strings should all be implemented as external functions, as discussed in ECMA-SL Operator Refactoring (+more syntax) #134

  3. chore(syntax): externalize list operators
    All list operations can be defined using the hd (head) and tl (tail) operations. Therefore, these two were retained as unary operators, while the remaining list operations were moved to external functions.

  4. chore(syntax): rename casting operators for consistency
    The syntax for some of the type casting operators did not match their internal names in the language. This commit fixes the discrepancies.

Instead of always requiring explicit casting between two types (e.g., type1_to_type2), should we allow our type casting operators to be polymorphic, where we simply specify the target type (similar to the approach in C/C++)?

  1. chore(syntax): remove symbolic expressions
    After Refactor: Using values of Smtml #128, symbolic expressions became obsolete. This commit removes them from the syntax of both .esl and .cesl.

  2. refactor(interpreter): clean the eval_op module
    With this commit, we cleaned the operator.ml and eval_op.ml modules. Now all operators are implemented in the same style. We also fix some of the error messages that can be generated while evaluating these operators.

  3. refactor(parser): remove the op_parser file
    The op_parser.mly file was introduced previously to handle the large number of operators available in .esl and .cesl. This file helped prevent duplicated code in both parsers. The main consequence of having this shared file, however, is that it becomes slightly harder to comprehend the parsers: the operator tokens and their usage are defined in the shared file, but their precedence and some operator tokens need to be in the main parser file. Since most of these operators have been externalized, I believe the disadvantages outweigh the advantages, hence the removal of this file.

I purposefully made this the last commit of this PR, so we can easily revert these changes if you don't approve them.

Future Work

  • Since the changes in Refactor: Using values of Smtml #128 (and probably even more so after this PR) there have been a lot of problems introduced in the ECMARef interpreters (small issues that cause the interpreter to crash). In of my next PRs I will attempt to address most of them, so we can go back to having a 93% test success rate.
  • The external files (in both the symbolic and specially in the concrete interpreters) are really messy. I plan on cleaning them ASAP. The same thing goes for the stdlib, which I want to split into multiple files.
  • If you don't oppose @filipeom, I will be changing the typeof function to be an operator once again. I think it makes total sense.
  • Two of the objects functions objectToList and objectFields should also be defined as external operators. The in_obj can also be defined as an external operator, but this would require another mechanism for differentiating a field not existing in an object vs it being set to 'undefined. Perhaps introducing the special absent value as in JISET
  • We are still missing line comments in the parsers
  • We are still missing proper syntax errors

@andreffnascimento andreffnascimento self-assigned this Jul 24, 2024
@andreffnascimento andreffnascimento added refactor General change of the implementation scope-language Related to the ECMA-SL language design labels Jul 24, 2024
Copy link
Member

@filipeom filipeom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry had busy day, only got to see this now.

Some comments about the Changes:

1. refactor(syntax): remove constants from operators

Note: I feel like the proper place for constants is in the stdlib of ECMA-SL.

I agree! 😄

4. chore(syntax): match the syntax of the ite operator in .cesl with .esl

I noticed that this operator does not behave like the standard conditional operators, in that both expressions e2 and e3 are always evaluated regarding the value of e1. Should we change this?

I don't think I understood, can you elaborate? For clarity, Eval.tripo Ite should return the appropriate value during evaluation: https://github.com/formalsec/smtml/blob/main/lib/eval.ml#L228

5. chore(parser): remove locations from .cesl and .esl parsers

Unless there is another reason which I'm not aware?

Unless you want to leak memory 😅 I think it was a major security issue we allowed this to happen. Nice catch!

6. chore(syntax): externalize math operators

I actually think abs, floor, ceil, and trunc could be operators. abs is similar to unary negation and floor or trunc are similar to casting operators. I.e., floor is just the composition of int_to_float o float_to_int (I think this turned into a pitch to just define these functions directly in the stdlib instead of as external functions 😆). And, ceil could also be defined similarly.

12. refactor(parser): remove the op_parser file

I think it's fine. My major concern of duplication would be about changing one thing in one parser and not in another. But I think the tests we have already will catch inconsistencies.

Future Work

If you don't oppose @filipeom, I will be changing the typeof function to be an operator once again. I think it makes total sense.

I think as long as it still returns a string we can still reason about it symbolically.

Two of the objects functions objectToList and objectFields should also be defined as external operators. The in_obj can also be defined as an external operator, but this would require another mechanism for differentiating a field not existing in an object vs it being set to 'undefined. Perhaps introducing the special absent value as in JISET

I like in_obj as an infix operator. Also, I dislike adding another value for representing nothing. We already have null, undefined, and void. I actually think we could move into Kotlin's direction in the future: https://stackoverflow.com/questions/55953052/kotlin-void-vs-unit-vs-nothing. We could build a stronger type system with Unit and Nothing instead of undefined and null.

Below I left you some nitpicks:

P.s.: Sorry for the possible typos. I'm doing this in a small screen.

@andreffnascimento
Copy link
Contributor Author

I don't think I understood, can you elaborate? For clarity, Eval.tripo Ite should return the appropriate value during evaluation: https://github.com/formalsec/smtml/blob/main/lib/eval.ml#L228

It returns the proper value, yes. But have a look at this example:

function foo() { print "foo"; return 10; }
function bar() { print "bar"; return 20; }
x := true ? foo() : bar();
print x;

With this code, the system is going to print:

foo
bar
10

My point is, should we even evaluate the bar function?

@filipeom
Copy link
Member

filipeom commented Jul 25, 2024

I don't think I understood, can you elaborate? For clarity, Eval.tripo Ite should return the appropriate value during evaluation: https://github.com/formalsec/smtml/blob/main/lib/eval.ml#L228

It returns the proper value, yes. But have a look at this example:

function foo() { print "foo"; return 10; }
function bar() { print "bar"; return 20; }
x := true ? foo() : bar();
print x;

With this code, the system is going to print:

foo
bar
10

My point is, should we even evaluate the bar function?

Oh, I see. Yes, for concrete execution this is indeed unintended behaviour:

| TriOpt (op, e1, e2, e3) ->
let v1 = eval_expr state e1 in
let v2 = eval_expr state e2 in
let v3 = eval_expr state e3 in
let eval_op_fun () = Eval_op.triop_semantics op v1 v2 v3 in
eval_operator eval_op_fun [ e1; e2; e3 ]

Seems like triops always evaluate the arguments before dispatching on the operator. We probably need a another case for TriOpt (Conditional, ... now to avoid unwanted side-effects

andreffnascimento added a commit that referenced this pull request Jul 25, 2024
@andreffnascimento
Copy link
Contributor Author

@filipeom I've addressed the review in 8a9219b

I've also pushed the changes to the interpreters/tests

Copy link
Member

@filipeom filipeom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 😄

@andreffnascimento andreffnascimento merged commit 4e21eff into main Jul 25, 2024
2 checks passed
andreffnascimento added a commit that referenced this pull request Jul 25, 2024
@andreffnascimento andreffnascimento deleted the fix-operators branch July 25, 2024 16:29
filipeom pushed a commit that referenced this pull request Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor General change of the implementation scope-language Related to the ECMA-SL language design
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants