Replies: 5 comments 3 replies
-
The tagline does not represent the full scope, it's defined with at least a little more detail here: https://projects.eclipse.org/projects/technology.apoapsis The ability to manage data across individual runs was one of the main motivations for the server implementation.
The problem with that is that the ORT result model is NOT stable. So if we store ORT results as plain JSON we will have to apply migrations to them to be able to read older results. Being able to manage breaking changes in the ORT result model was one of the reasons for the decision to map it to a relational representation. The complexity of the current schema IMO is mainly caused by the fact that it had to be developed in a rush, but it is possible to improve that. Also, providing data to the UI efficiently would likely require mapping the stored raw results to another model anyway. |
Beta Was this translation helpful? Give feedback.
-
To rephase @mnonnenmacher's answer: If you store just the ORT result in the database, you are not able anymore to filter the ORT runs to answer such questions:
This kind of advanced statistics what one of the main motivation for ORT Server. |
Beta Was this translation helpful? Give feedback.
-
IMHO, the data model used by ORT is not very suitable to match the requirements of a scalable server solution. Actually, the current implementation of the workers more or less tries to achieve what you describe: to represent an ORT result in the SQL database, to update it on each step, and to use it as input for the next step. The complexity you mention comes from the fact that it is really hard to represent the ORT result data in an efficient and somehow normalized way in an SQL database. So, I think for the future we should rather investigate where we could deviate from the 1:1 representation of the ORT data model in SQL to make access to and handling of the data more easy and efficient. Maybe this could also lead to changes in ORT itself. For instance, the fact that each ORT component requires a full result in memory is a hard limit for the size of projects that can be analyzed and also prevents optimizations like doing a more fine-grained and parallel processing on single packages. |
Beta Was this translation helpful? Give feedback.
-
I think we need a rough architectural change, not longer trying to translate/map ORT Results (from output of the stages of the ORT pipeline) to SQL database structures, but instead store it as simple as possible. But as addition, after each run, we can read the final ORT Result and extract the statistical data we are interested in into a statistics database that is optimized for querying (sort, filter, date/time) ... |
Beta Was this translation helpful? Give feedback.
-
It took me some time to realize that one prominent goal of ORT Server is to use ORT as a library, while I was thinking it is just providing an environment to run ORT via the command line interface (CLI). So what I had in mind was a totally different approach. I therefore will close this idea now. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
What is the original scope of ORT Server (from https://github.com/eclipse-apoapsis/ort-server):
A scalable server implementation of the OSS Review Toolkit.
The Eclipse Apoapsis project's ORT Server is a standalone application to deploy the OSS Review Toolkit as a service in the cloud.
What does it mean:
So the scope of ORT server is to provide an environment where ORT pipeline stages can be executed in a Kubernetes cloud, and have results, logs and reports stored in the cloud. Nothing less, nothing more.
Problem
So, isn't it as simple as taking the ORT results of the previous stage as input, process them in the current pipeline stage, and write the ORT result as output? The next stage then again takes this ORT result as input and so on ...
ORT Result Input --> stage: process in worker --> ORT Result output --> next stage ...
The problem is, in the meantime there is lots of code that:
This means, ORT server explicitly modifies results of pipeline stages in this or that way, instead to just transparently handle them unmodified as they are returned from ORT core.
This way, no longer you can be sure that the scan results from a traditional ORT pipeline are the same as the ones when you scan a repository with ORT Server.
Effects of this can already be seen in the UI: Issues and Vulnerabilities are displayed that differ from the ones that are generated by the ORT Reporters, because they operate on different data.
Proposal
Benefit
Challenges
Beta Was this translation helpful? Give feedback.
All reactions