feat: Blog posts for OSSF Scorecard launch

Signed-off-by: John McBride <[email protected]>
open-sauced · Aug 5, 2024 · f24f18d · f24f18d
1 parent e5a505b
commit f24f18d
Show file tree

Hide file tree

Showing 6 changed files with 363 additions and 0 deletions.
diff --git a/blog/2024/2024-08-06-introducing-ossf-scorecard.md b/blog/2024/2024-08-06-introducing-ossf-scorecard.md
@@ -0,0 +1,128 @@
+---
+title: "Introducing OpenSSF Scorecard for OpenSauced"
+tags: ["open source security foundation", "openssf", "openssf scorecard", "open source", "open source compliance", "open source security"]
+authors: jpmcb
+slug: introducing-ossf-scorecard
+description: "Learn how OpenSauced integrates OpenSSF Scorecard to enhance open source security and compliance."
+---
+
+In September of 2022, the European Parliament introduced the [“Cyber Resilience Act”](https://digital-strategy.ec.europa.eu/en/policies/cyber-resilience-act),
+commonly called the CRA: a new piece of legislation that requires anyone providing
+digital products in the EU to meet certain security and compliance requirements.
+
+<!-- truncate -->
+
+But there’s a catch: before the CRA, companies providing or distributing software
+would often need to take on much of the risk when ensuring safe and reliable software
+was being shipped to end users. Now, software maintainers further down the supply
+chain will have to carry more of that weight. Not only may certain open source
+maintainers need to meet certain requirements, but they may have to provide an
+up to date security profile of their project.
+
+[As the Linux Foundation puts it](https://www.linuxfoundation.org/blog/understanding-the-cyber-resilience-act):
+
+> The Act shifts much of the security burden onto those who develop software,
+as opposed to the users of software. This can be justified by two assumptions:
+first, software developers know best how to mitigate vulnerabilities and distribute
+patches; and second, it’s easier to mitigate vulnerabilities at the source than
+requiring users to do so.
+
+There’s a lot to unpack in the CRA. And it’s still not clear how individual open
+source projects, maintainers, foundations, or companies will be directly impacted
+But, it’s clear that the broader open source ecosystem needs easier ways to understand
+the security risk of projects deep within dependency chains. With all that in mind,
+we are very excited to introduce the OpenSSF Scorecard ratings within the OpenSauced
+platform. 
+
+## What is the OpenSSF Scorecard?
+
+The OpenSSF is [the Open Source Security Foundation](https://openssf.org/): a multidisciplinary group of
+software developers, industry leaders, security professionals, researchers, and
+government liaisons. The OpenSSF aims to enable the broader open source ecosystem
+“to secure open source software for the greater public good.” They interface with
+critical personnel across the software industry to fight for a safer technological
+future.
+
+[The OpenSSF Scorecard project](https://github.com/ossf/scorecard) is an effort
+to unify what best practices open source maintainers and consumers should use to
+judge if their code, practices, and dependencies are safe. Ultimately, the “scorecard”
+command line interface gives any the capability to inspect repositories, run “checks”
+against those repos, and derive an overall score for the risk profile of that project.
+It’s a very powerful software tool that gives you a general picture of where a piece
+of software is considered risky. It can also be a great starting point for any open
+source maintainer to develop better practices and find out where they may need to
+make improvements. By providing a standardized approach to assessing open source
+security and compliance, the Scorecard helps organizations more easily identify
+supply chain risks and regulatory requirements.
+
+## OpenSauced OpenOSSF Scorecards
+
+Using the scorecard command line interface as a cornerstone, we’ve built infrastructure
+and tooling to enable OpenSauced to capture scores for nearly all repositories on
+GitHub. Anything over a 6 or a 7 is generally considered safe to use with no blaring
+issues. Scores of 9 or 10 are doing phenomenally well. And projects with lower scores
+should be inspected closely to understand what’s gone wrong.
+
+Scorecards are enabled across all repositories. With this integration, we aim to
+make it easier for software maintainers to understand the security posture of their
+project and for software consumers to be assured that their dependencies are safe
+to use.
+
+Starting today, you can see the score for any project within individual [Repository Pages](https://opensauced.pizza/docs/features/repo-pages/).
+For example, in [kubernetes/kubernetes](https://app.opensauced.pizza/s/kubernetes/kubernetes),
+we can see the project is safe for use:
+
+![Kubernetes Scorecard](../../static/img/kubernetes-scorecard.png)
+
+Let’s look at another example: [crossplane/crossplane](https://app.opensauced.pizza/s/crossplane/crossplane).
+These maintainers are doing an awesome job of ensuring they are following best
+practices for open source security and compliance!!
+
+![Crossplan Scorecard](../../static/img/crossplane-scorecard.png)
+
+The checks that the OpenSSF Scorecard looks for involves a wide range of common
+open source security practices, both “in code” and with the maintenance of the
+project: e.g. checking for code review best practices, if there are “dangerous
+workflows” present (like untrusted code being run and checked out during CI/CD runs),
+if the project is actively maintained, the use of signed releases, and many more.
+
+## The Future of OpenSSF Scorecards at OpenSauced
+
+We plan to bring the OpenSSF Scorecard to more of the OpenSauced platform, as we
+aim to be the definitive place for open source security and compliance for maintainers
+and consumers. As part of that, we’ll be bringing more details to the OpenSSF Scorecard
+with how individual checks are ranked:
+
+![Future Scorecard](../../static/img/future-scorecard.png)
+
+We’ll also be bringing OpenSSF Scorecard to our premium offering, [Workspaces](https://opensauced.pizza/docs/features/workspaces/):
+
+![Bottlerocket Scorecard Workspace](../../static/img/future-scorecard-workspaces.png)
+
+Within a Workspace, you’ll soon be able to get an idea of how each of the projects
+you are tracking stack up alongside each other's score for open source security and
+compliance. You can use the OpenSSF Score together with all the Workspace insights
+and metrics, all in one single dashboard, to get a good idea of what’s happening within
+a set of repositories and what their security posture is. In this example, I’m tracking
+all the repositories within the bottlerocket-os org on GitHub, a security focused
+Linux based operating system: I can see that each of the repositories has a good
+rating which gives me greater confidence in the maintenance status and security
+posture of this ecosystem. This also enables stakeholders and maintainers of Bottlerocket
+to have a birds eye snapshot of the compliance and maintenance status of the 
+entire org.
+
+As the CRA and similar regulations push more of the security burden onto developers,
+tools like the OpenSSF Scorecard become invaluable. They offer a standardized, accessible
+way to assess and improve the security of open source projects, helping maintainers
+meet new compliance requirements and giving software consumers confidence in their
+choices. 
+
+Looking ahead, we're committed to expanding these capabilities at OpenSauced. By
+providing comprehensive security insights, from individual repository scores to
+organization-wide overviews in Workspaces, we're working to create a more secure
+and transparent open source ecosystem, to enable anyone in the open source community
+to better understand their software dependencies, feel empowered to make a meaningful
+change if needed, and provide helpful tools to open source maintainers to better
+maintain their projects.
+
+Stay saucy!
diff --git a/blog/2024/2024-08-08-ossf-scorecard-technical-deep-dive.md b/blog/2024/2024-08-08-ossf-scorecard-technical-deep-dive.md
@@ -0,0 +1,235 @@
+---
+title: "Using Kubernetes jobs to scale OpenSSF Scorecard"
+tags: ["open source security foundation", "openssf", "openssf scorecard", "open source", "open source compliance", "open source security", "kubernetes", "kubernetes jobs"]
+authors: jpmcb
+slug: ossf-scorecard-technical-deep-dive
+description: "Learn how OpenSauced uses Kubernetes to scale the OpenSSF Scorecard."
+unlisted: true
+---
+
+We recently released integrations with the [OpenSSF Scorecard on the OpenSauced platform](https://opensauced.pizza/blog/introducing-ossf-scorecard).
+The OpenSSF Scorecard is a powerful Go command line interface that anyone can use
+to begin understanding the security of their projects and dependencies. It runs
+several checks for dangerous workflows, CICD best practices, if the project is
+still maintained, and much more. This enables software builders and consumers to
+understand their security posture, deduce if a project is safe to use, and where
+improvements to security practices need to be made.
+
+<!-- truncate -->
+
+But one of our goals with integrating the OpenSSF Scorecard into the OpenSauced
+platform was to make this available to the broader open source ecosystem at large.
+If it’s a repository on GitHub, we wanted to be able to display a score for it.
+This meant scaling the Scorecard CLI to target nearly any repository on GitHub.
+Much easier said than done!
+
+In this blog post, let’s dive into how we did that using Kubernetes and what technical
+decisions we made with implementing this integration.
+
+## Technical decisions
+
+We knew that we would need to build a cron type microservice that would frequently
+update scores across a myriad of repositories: the true question was how we would
+do that. It wouldn't make sense to run the scorecard CLI ad-hoc: the platform could
+too easily get overwhelmed and we wanted to be able to do deeper analysis on scores
+across the open source ecosystem, even if the OpenSauced repo page hasn’t been
+visited recently. Initially, we looked at using the Scorecard Go library as direct
+dependent code and running scorecard checks within a single, monolithic microservice.
+We also considered using serverless jobs to run one off scorecard containers that
+would give back the results for individual repositories.
+
+The approach we ended up landing on, which marries simplicity, flexibility, and
+power, is to use Kubernetes Jobs at scale, all managed by a “scheduler” Kubernetes
+controller microservice. Instead of building a deeper code integration with scorecard,
+running one off Kubernetes Jobs gives us the same benefits of using a serverless approach,
+but with reduced cost since we’re managing it all directly on our Kubernetes cluster.
+Jobs also offer alot of flexibility in how they run: they can have long, extended
+timeouts, they can use disk, and like any other Kubernetes paradigm, they can have
+multiple pods doing different tasks. 
+
+Let’s break down the individual components of this system and see how they work
+in depth:
+
+## Building the Kubernetes controller
+
+The first and biggest part of this system is the “scorecard-k8s-scheduler”; a Kubernetes
+controller-like microservice that kicks off new jobs on-cluster. While this microservice
+follows many of the principles, patterns, and methods used when building a traditional
+Kubernetes controller or operator, it does not watch for or mutate custom resources 
+on the cluster. Its function is to simply kick off Kubernetes Jobs that run the Scorecard
+CLI and gather finished job results.
+
+Let’s look first at the main control loop in the Go code. This microservice uses
+the Kubernetes Client-Go library to interface directly with the cluster the microservice
+is running on: this is often referred to as an on-cluster config and client. Within
+the code, after bootstrapping an on-cluster config and client, we poll for repositories
+in our database that need updating. Once some repos are found, we kick off Kubernetes
+jobs on individual worker “threads” that will wait for each job to finish.
+
+```go
+// buffered channel, sort of like semaphores, for threaded working
+sem := make(chan bool, numConcurrentJobs)
+
+// continuous control loop
+for {
+    // blocks on getting semaphore off buffered channel
+    sem <- true
+
+    go func() {
+        // release the hold on the channel for this Go routine when done
+        defer func() {
+            <-sem
+        }()
+
+        // grab repo needing update, start scorecard Kubernetes Job on-cluster,
+        // wait for results, etc. etc.
+
+        // sleep the configured amount of time to relieve backpressure
+        time.Sleep(backoff)
+    }()
+}
+```
+
+This “infinite control loop” method, with a buffered channel, is a common way in
+Go to continuously do something but only using a configured number of threads.
+The number of concurrent Go funcs that are running at any one given time depends
+on what configured value the “numConcurrentJobs” variable has. This sets up the
+buffered channel to act as a worker pool or semaphore which denotes the number of
+concurrent Go funcs running at any one given time. Since the buffered channel is
+a shared resource that all threads can use and inspect, I often like to think of
+this as a semaphore: a resource, much like a mutex, that multiple threads can attempt
+to lock on and access. In our production environment, we’ve scaled the number of
+threads in this scheduler all running at once. Since the actual scheduler isn’t
+very computationally heavy and will just kick off jobs and wait for results to eventually
+surface, we can push the envelope of what this scheduler can manage. We also have
+a built-in backoff system that attempts to relieve pressure when needed: this system
+will increment the configured “backoff” value if there are errors or if there are
+no repos found to go calculate the score for. This ensures we’re not continuously
+slamming our database with queries and the scorecard scheduler itself can remain
+in a “waiting” state, not taking up precious compute resources on the cluster.
+
+Within the control loop, we do a few things: first, we query our database for repositories
+needing their scorecard updated. This is a simple database query that is based on
+some timestamp metadata we watch for and have indexes on. Once a configured amount
+of time passes since the last score was calculated for a repo, it will bubble up
+to be crunched by a Kubernetes Job running the Scorecard CLI.
+
+## Kicking off Scorecard jobs
+
+Next, once we have a repo to get the score for, we kick off a Kubernetes Job using
+the “gcr.io/openssf/scorecard” image. Bootstrapping this job in Go code using Client-Go
+looks very similar to how it would look with yaml, just using the various libraries
+and apis available via “k8s.io” imports and doing it programmatically:
+
+```go
+// defines the Kubernetes Job and its spec
+job := &batchv1.Job{
+	// structs and details for the actual Job including metav1.ObjectMeta and batchv1.JobSpec
+}
+
+// create the actual Job on cluster using the in-cluster config and client
+return s.clientset.BatchV1().Jobs(ScorecardNamespace).Create(ctx, job, metav1.CreateOptions{})
+```
+
+After the job is created, we wait for it to signal it has completed or errored.
+Much like with kubectl, Client-Go offers a helpful way to “watch” resources and
+observe their state when they change:
+
+```go
+// watch selector for the job name on cluster
+watch, err := s.clientset.BatchV1().Jobs(ScorecardNamespace).Watch(ctx, metav1.ListOptions{
+    FieldSelector: "metadata.name=" + jobName,
+})
+
+// continuously pop off the watch results channel for job status
+for event := range watch.ResultChan() {
+    	// wait for job success, error, or other states
+}
+```
+
+Finally, once we have a successful job completion, we can grab the results from
+the Job’s pod logs which will have the actual json results from the scorecard
+CLI! Once we have those results, we can upsert the scores back into the database
+and mutate any necessary metadata to signal to our other microservices or the
+OpenSauced API that there’s a new score!
+
+As mentioned before, the scorecard-k8s-scheduler can have any number of concurrent
+jobs running at once: in our production setting we have a large number of jobs running
+at once, all managed by this microservice. The intent is to be able to update scores
+every 2 weeks across all repositories on GitHub. With this kind of scale, we hope
+to be able to provide powerful tooling and insights to any open source maintainer
+or consumer!
+
+## Role-based access control
+
+The “scheduler” microservice ends up being a small part of this whole system: anyone
+familiar with Kubernetes controllers knows that there are additional pieces of Kubernetes
+infrastructure that are needed to make the system work. In our case, we needed some
+role-based access control (RBAC) to enable our microservice to create Jobs on the cluster.
+
+First, we need a service account: this is the account that will be used by the
+scheduler and have access controls bound to it:
+
+```yaml
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: scorecard-sa
+  namespace: scorecard-ns
+```
+
+We place this service account in our “scorecard-ns” namespace where all this runs.
+
+Next, we need to have a role and role binding for the service account. This includes
+the actual access controls (including being able to create Jobs, view pod logs, etc.)
+
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: scorecard-scheduler-role
+  namespace: scorecard-ns
+rules:
+- apiGroups: ["batch"]
+  resources: ["jobs"]
+  verbs: ["create", "delete", "get", "list", "watch", "patch", "update"]
+- apiGroups: [""]
+  resources: ["pods", "pods/log"]
+  verbs: ["get", "list", "watch"]
+
+---
+
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: scorecard-scheduler-role-binding
+  namespace: scorecard-ns
+subjects:
+- kind: ServiceAccount
+  name: scorecard-sa
+  namespace: scorecard-ns
+roleRef:
+  kind: Role
+  name: scorecard-scheduler-role
+  apiGroup: rbac.authorization.k8s.io
+```
+
+You might be asking yourself “Why do I need to give this service account access
+to get pods and pod logs? Isn’t that an over extension of the access controls?”
+Remember! Jobs have pods and in order to get the pod logs that have the actual
+results of the scorecard CLI, we must be able to list the pods from a job and then
+read their logs!
+
+The second part of this, the “RoleBinding”, is where we actually attach the Role
+to the service account. This service account can then be used when kicking off
+new jobs on the cluster.
+
+All in all, this architecture allows us to use the flexibility and power of serverless like setups,
+but it still takes advantage of the cost savings and existing infrastructure we have
+with Kubernetes. Using existing paradigms and components can be a great way to unlock
+existing capabilities you already have within your platform of choice!
+
+Huge shout out to [Alex Ellis](https://github.com/alexellis) and his excellent [run-job controller](https://github.com/alexellis/run-job):
+this was a huge inspiration and reference for correctly using Client-Go with Jobs!
+
+Stay saucy!
diff --git a/static/img/crossplane-scorecard.png b/static/img/crossplane-scorecard.png
diff --git a/static/img/future-scorecard-workspaces.png b/static/img/future-scorecard-workspaces.png
diff --git a/static/img/future-scorecard.png b/static/img/future-scorecard.png
diff --git a/static/img/kubernetes-scorecard.png b/static/img/kubernetes-scorecard.png