Skip to content

Latest commit

 

History

History
925 lines (763 loc) · 29.5 KB

programming.md

File metadata and controls

925 lines (763 loc) · 29.5 KB
title description date
码农碎碎念~
薄雾浓云愁永昼, 瑞脑销金兽. 佳节又重阳, 玉枕纱厨, 半夜凉初透.
2023-07-17
  • 自己在用的一些 Go 的官方命令行工具:
go install golang.org/x/vuln/cmd/govulncheck@latest
go install golang.org/x/tools/cmd/deadcode@latest
  • Rust 的一些容易犯的小错误 (Coding 的时候)
    • 忘记引入相应的 trait



  • bon
    • bon is a Rust crate for generating compile-time-checked builders for functions and structs.
    • 我想说的是: Go 做不到! 哈哈~


We've now seen two different approaches (Push/Pull)
to looping over all the elements of a set.
Different Go packages use these approaches and several others.
That means that when you start using a new Go container package
you may have to learn a new looping mechanism.
It also means that we can't write one function that
works with several different types of containers,
as the container types will handle looping differently.

We want to improve the Go ecosystem by developing
standard approaches for looping over containers.
As of Go 1.23 it now supports ranging over functions that
take a single argument. The single argument must itself be a
function that takes zero to two arguments and returns a bool;
by convention, we call it the yield function.
func(yield func() bool)

func(yield func(V) bool)

func(yield func(K, V) bool)
When we speak of an iterator in Go, we mean a function
with one of these three types. As we'll discuss below,
there is another kind of iterator in the
standard library: a pull iterator.
When it is necessary to distinguish between
standard iterators and pull iterators,
we call the standard iterators push iterators.
As a matter of convention, we encourage all container types
to provide an All method that returns an iterator,
so that programmers don't have to remember whether to range
over All directly or whether to call All
to get a value they can range over.
A pull iterator works the other way around:
it is a function that is written such that each time
you call it, it returns the next value in the sequence.

We'll repeat the difference between the two types
of iterators to help you remember:

A push iterator pushes each value in a sequence to
a yield function. Push iterators are standard iterators
in the Go standard library, and are supported
directly by the for/range statement.

A pull iterator works the other way around. Each time you
call a pull iterator, it pulls another value from a sequence
and returns it. Pull iterators are not supported directly by
the for/range statement; however, it's straightforward to write
an ordinary for statement that loops through a pull iterator.
The first function returned by iter.Pull, the pull iterator,
returns a value and a boolean that reports
whether that value is valid.
The boolean will be false at the end of the sequence.
iter.Pull returns a stop function in case we don't read
through the sequence to the end. In the general case the
push iterator, the argument to iter.Pull, may
start goroutines, or build new data structures that need
to be cleaned up when iteration is complete.

The push iterator will do any cleanup when the yield
function returns false, meaning that no more values
are required. When used with a for/range statement,
the for/range statement will ensure that if the loop
exits early, through a break statement or for any
other reason, then the yield function will return false.

With a pull iterator, on the other hand, there is no way
to force the yield function to return false,
so the stop function is needed.
// EqSeq reports whether two iterators contain the same
// elements in the same order.
func EqSeq[E comparable](s1, s2 iter.Seq[E]) bool {
  next1, stop1 := iter.Pull(s1)
  defer stop1()
  next2, stop2 := iter.Pull(s2)
  defer stop2()
  for {
    v1, ok1 := next1()
    v2, ok2 := next2()
    if !ok1 {
      return !ok2
    }
    if ok1 != ok2 || v1 != v2 {
      return false
    }
  }
}

自 GitOps 理念以来, 至少在长驻的任务上, 带来的便利是毋庸置疑的~
之前也实施过一个项目, 基于:
https://github.com/apache/flink-kubernetes-operator
开发了公司内部的 Flink Job 的调度, 也颇有收益~
但是, 对于 Batch Job, GitOps 还合适么?
比如:
https://github.com/kubeflow/spark-operator
https://github.com/apache/spark-kubernetes-operator

我不觉得!

从一个十分粗暴的角度而言, GitOps 就是声明了长驻资源~
一切皆 GitOps 显然不合适.
或许, 简单的原则是:
GitOps 适用于手动 (包括通过一些: 工具/CI/CD) 提交的资源定义.
而这些资源定义, 一般而言是相对不容易变更的.
此处不容易变更, 是相对的, 大体上不会超过 (微) 服务发布的频率.


7 月末, 体验了一下 Axum (Rust) 的 Web 开发.
说实话, 纯粹个人的角度, 比 Hertz (Go) 要好得多!

但是, 很快发现意义不大, 因为, Rust 的生态优势目前有三处:
1. 区块链/密码学/同态加密 (隐私计算)
2. 围绕着 Arrow(-rs) 与 DataFusion 的数据生态
3. Rust powered 的 Python 生态

其实绝大多数纯 Web 开发者很大概率不会接受 Rust 的学习曲线.
所以, 除非是个人开发者或者 (独立) 开源项目,
否则, Rust 的纯 Web 开发意义不大~
(包括用 Rust 写 K8s operator 等.)

嗯, 如果团队成员大多数是 Rust 掌握者呢? 比如: Data, ML 团队~
再熟练的 Rust 玩家在 Web 开发领域得到的优势也抵不过消耗.
(比如: 编译时间, 编译报错的处理等.)

打一个不恰当的比喻: Rust 大多处于 Data Plane (数据处理, 计算密集).
(Data Plane 对立面是 Control Plane)

有一个特殊情况, 很多 Rust 编写的数据计算组件,
K8s operator 也就用 Rust 写了, 单一代码库, 同一种语言, 也合理.
否则, 其实我不觉得 Rust 写 K8s operator 有任何好处~

// The problem was that those references might be
// self-references, meaning they point to
// other fields of the same object.
async fn foo<'a>(z: &'a mut i32) {
  // ...
}

async fn bar(x: i32, y: i32) -> i32 {
  let mut z = x + y;
  foo(&mut z).await;
  z
}
// Let's ask ourselves, what would the internal
// states of `Bar` be? Something like this:

enum Bar {
  // When it starts, it contains only its arguments
  Start { x: i32, y: i32 },

  // At the first await, it must contain `z` and
  // the `Foo` future that references `z`
  FirstAwait { z: i32, foo: Foo<'?> }

  // When its finished it needs no data
  Complete,
}
// The `Foo` object instead borrows the `z` field of `Bar`,
// which is stored along side it in the same struct.
// This is why these future types are said to be
// "self-referential:" they contain fields which
// reference other fields in themselves.

  • 特别喜欢 Dependi 这个 VSCode 插件
    • 原本的插件 Crates 只支持 Rust (archived)
    • Now: Rust, Go, JavaScript, Python

  • BLAKE3
    • Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.
    • Secure, unlike MD5 and SHA-1. And secure against length extension, unlike SHA-2.
    • Highly parallelizable across any number of threads and SIMD lanes, because it's a Merkle tree on the inside.
    • Go lukechampine/blake3
简单 bench 了下

(1_000_000 次):
sha2 (256): 6666.56 ms
md5:        1252.75 ms
blake3:     417.502 ms

(10_000_000 次):
sha2 (256)  66.40 s
md5         12.74 s
blake3      3.47  s

为啥不 WASM 的那一段很务实.

But there is a workaround: use the C ABI.
Unlike Rust, C does have a stable ABI on
every major OS and processor architecture.
So if we can constrain our plugin interface
to only use C-compatible data structures and
functions we can safely link against plugins
compiled by any Rust compiler.

Even better: as the C ABI is the lingua franca
in the systems world, many other languages are
able to emit it, opening the door to supporting
UDFs in a variety of compiled languages.
  • 时至今日, 个人承认 Hertz 算是一个可用的 Go Web 框架吧~

    • 首先, Go 的语言特性限制, 导致不可能出现功能丰富的 Web 框架; 所以, Go 的 Web 框架基本属于大同小异.
    • 但是 Hertz 属于在大同小异之中做到了细节完善度较高~ 比如: binding & validation.
  • Introducing Istio v1 APIs

    • Reflecting the stability of Istio's features, our networking, security and telemetry APIs are promoted to v1 in 1.22.
    • 嗯, 我也远离了 Istio 了, 哈哈哈~

  • 2024-05, 正好手头有个场景, 简单 benchmark 了一下 NetworkX vs Raphtory vs rustworkx

    • rustworkx 这个名字起的不好~
    • Python 3.12
    • rustworkx 可以添加 object 作为 node, 但是会导致 .neighbors(node_id) 显著变慢; 也有可能是 .get_node_data(node_id) 导致的. 没有细究~
    • 同样的使用模式下, 不包含构图过程, 最简单的 neighbors(), 粗略估计: NetworkX (6.864 s) 耗时是 rustworkx (0.728 s) ~9.4 倍; Raphtory (0.748 s) 耗时与 rustworkx (0.728 s) 基本持平.
    • 但是, 查询结果上, Raphtory 与 NetworkX 相对接近; rustworkx 则有一定的差异, 想来准确性有待提升.
  • 云风的 Blog: 重新启程

2018 年开始, 我决定安心做一点想做而擅长的事.
人生短暂, 学习如何管理很多人做事并非我期望的发展方向.
尤其当我逐步融入开源社区后, 我发现,
这个世界上许多软件基础设施往往都是由一两个人支撑.
早在 2011 年时, 我就怀疑过, 软件项目需要很多人一起完成可能是一个骗局,
那么, 当处于一个稳定的环境而自己又有能力时,
这种机遇并不多见, 就应该尝试做点什么.

The typical approach in Luminal for supporting new backends would be:

1. Swap out each primitive operation with a backend-specific operation.
2. Add in operations to copy to device and copy from device
   before and after Function operations.
3. Pattern-match to swap out chunks of
   operations with specialized variants.
4. All other optimizations.
One more note: The core of Luminal has no idea about any of this!
GPUs are a foreign concept to it, which is nessecary since we
want to add backends to TPUs, Groq chips, and whatever else
may come in the future without changing anything in the core.

拭目以待!


Our new generator, which we unimaginatively named ChaCha8Rand
for specification purposes and implemented as
math/rand/v2's rand.ChaCha8,
is a lightly modified version of ChaCha stream cipher.
ChaCha is widely used in a 20-round form called ChaCha20,
including in TLS and SSH.
We used ChaCha8 as the core of ChaCha8Rand.
Most stream ciphers, including ChaCha8, work by defining a
function that is given a key and a block number and produces a
fixed-size block of apparently random data.
The cryptographic standard these aim for (and usually meet) is
for this output to be indistinguishable from actual random data in
the absence of some kind of exponentially costly brute-force search.
A message is encrypted or decrypted by XOR'ing successive blocks of
input data with successive randomly generated blocks.
To use ChaCha8 as a rand.Source, we use the generated blocks directly
instead of XOR'ing them with input data
(this is equivalent to encrypting or decrypting all zeros).
We changed a few details to make ChaCha8Rand more
suitable for generating random numbers.
The Go runtime now maintains a per-core ChaCha8Rand state
(300 bytes), seeded with operating system-supplied
cryptographic randomness, so that random numbers can be
generated quickly without any lock contention.
Dedicating 300 bytes per core may sound expensive,
but on a 16-core system, it is about the same as storing
a single shared Go 1 generator state (4,872 bytes).
The speed is worth the memory.
Overall, ChaCha8Rand is slower than the Go 1 generator,
but it is never more than twice as slow,
and on typical servers, the difference is never more than 3ns.
Very few programs will be bottlenecked by this difference,
and many programs will enjoy the improved security.

  • GQL Database Language
    • ISO/IEC 39075 Database Language GQL
    • 其实也带来了命名规范:
    • label, not tag
    • property: pairs of names and values
    • node, not vertex
    • edge, not relationship
The GQL standard does not specify how the
returned data is displayed to the user.
MATCH ((a)-[r]->(b)){1, 5}
RETURN a, r, b

-- This example will find paths where one node
-- knows another node, up to five hops long.
Nodes are enclosed in parenthesis while
edges are enclosed in square brackets.
INSERT (:Person {
  firstname: 'Avery',
  lastname: 'Stare',
  joined: date("2022-08-23")
})
- [:LivesIn {
  since: date("2022-07-15")
}]
-> (:City {
  name: 'Granville',
  state: 'OH',
  country: 'USA'
})
MATCH (a {
  firstname: 'Avery'
}), (d {
  name: 'Unique'
})
INSERT (a) - [:HasPet] -> (d)
-- GQL data is deleted by identifying nodes,
-- detaching them to delete relationships,
-- then deleting the nodes.
MATCH (a {firstname: 'Avery'}) - [b] -> (c)
DETACH DELETE a, c
A schema-free graph will accept any data that is inserted.
This allows for quick startup but leaves the control of
the data with the application developer(s) and/or users.

  • Fluence

    • Fluence is a decentralized serverless computing platform.
    • 2024-04-15, 因 Fluence Developer Reward Airdrop 结缘~ 祝好!
  • Loco


This milestone represents a key transition in
Ethereum's long-term roadmap:

blobs are the moment where Ethereum scaling ceased to be
a "zero-to-one" problem, and became a "one-to-N" problem.
The next stage is likely to be a simplified version of
DAS called PeerDAS. In PeerDAS, each node stores a significant
fraction (eg. 1/8) of all blob data, and nodes maintain
connections to many peers in the p2p network.
When a node needs to sample for a particular piece of data,
it asks one of the peers that it knows is
responsible for storing that piece.
// rustc 1.77.0
alignment of i128: 16
func (r *TheController) SetupWithManager(mgr ctrl.Manager) error {
  return ctrl.NewControllerManagedBy(mgr).
    For(&TheObject{}).
    // This is useful because we don't want to
    // reconcile again when the generation is not changed.
    // The generation is changed when the spec is updated.
    // The generation is not changed when the status is updated.
    WithEventFilter(predicate.GenerationChangedPredicate{}).
    Complete(r)
}
For those readers familiar with transformers
and eager for the punchline, here it is:

Each transformer block (containing a multi-head self-attention
layer and feed-forward network) learns weights that associate a
given prompt with a class of strings found in the training corpus.
The distribution of tokens that follow those strings in the
training corpus is, approximately, what the block outputs as
its predictions for the next token.

Each block may associate the same prompt with a different
class of training corpus strings, resulting in a different
distribution of the next tokens and thus different predictions.
The final transformer output is a linear combination of
each block's predictions.
The takeaway is that simplifying the transformation performed
by the blocks to just the contributions of the feed-forward
networks results in a shorter output vector (has a smaller norm)
than the original output but points in roughly the same direction.

And the difference in norms would have no impact on the
transformer's final output, because of the LayerNorm operation
after the stack of blocks. That LayerNorm step will adjust the
norm of any input vector to a similar value regardless of its
initial magnitude; the final linear layer that follows it will
always see inputs of approximately the same norm.
I think the model has learned a complex, non-linear embedding
subspace corresponding to each token. Any embedding within that
subspace results in an output distribution that assigns the
token near a certain probability.
Each embedding I was able to learn is probably a point in
the embedding subspace for the corresponding token.
Within a block, adding the feed-forward network output
vector to the input produces an output embedding that
better aligns with the embedding subspaces of specific tokens.

And those tokens are the same ones predicted in the approximation:
they're the tokens that follow the strings in the training
corpus that yield similar feed-forward network
outputs to the current prompt.

哈哈, 作者蛮逗的~ 结论一般, 过程值得尊敬~

Resource savings are nice to have, but the real power of
Flink Autotuning is the reduced time to production.

With Flink Autoscaling and Flink Autotuning, all users
need to do is set a max memory size for the TaskManagers,
just like they would normally configure TaskManager memory.
Flink Autotuning then automatically adjusts the various
memory pools and brings down the total container memory size.
It does that by observing the actual max memory usage on
the TaskMangers or by calculating the exact number of
network buffers required for the job topology.
The adjustments are made together with Flink Autoscaling,
so there is no extra downtime involved.

很实用的功能, 实际效果有待检验!

Arroyo 0.10 ships as a single, compact binary that
can be deployed in a variety of ways.
Our first decision was to adopt Apache Arrow as our in-memory
data representation, replacing the static Struct types.
Arrow is a columnar, in-memory format designed for
analytical computations. The coolest thing about Arrow is that
it's a cross-language standard; it supports sharing data
directly between engines and even different languages without
copying or serialization overhead.
For example, Pandas programs written in Python could
operate directly on data generated by Arroyo.
The takeaway: we only have to pay high overhead of small
batch sizes when our data volume is very low.
But if we're only handling 10 or 100 events per second,
the overall cost of processing will be very small in any case.
And at high data volumes (tens of thousands to millions of
events per second) we can have our cake and eat it too-achieve
high throughput with batching and columnar data while
still maintaining low absolute latency.
Now that Arroyo compiles down to a single binary,
we're working to remove the other external dependencies,
including Postgres and Prometheus;
future releases of Arroyo will have the option of running
their control plane on an embedded sqlite database.
  • River

    • Reverse Proxy Application, based on the Pingora library from Cloudflare.
    • 嗯, 期待下一步, API Gateway!
  • Blixt

    • An experimental layer 4 load-balancer for Kubernetes.
    • The control-plane is built using Gateway API and written in Golang with Operator SDK/Controller Runtime.
    • The data-plane is built using eBPF and is written in Rust using Aya.
  • Robust generic functions on slices

    • Delete need not allocate a new array, as it shifts the elements in place. Like append, it returns a new slice.
    • Many other functions in the slices package follow this pattern, including Compact, CompactFunc, DeleteFunc, Grow, Insert, and Replace.
    • When calling these functions we must consider the original slice invalid, because the underlying array has been modified.
    • go vet 应该检测这些~
    • Out of pragmatism, we chose to modify the implementation of the five functions Compact, CompactFunc, Delete, DeleteFunc, Replace to "clear the tail".
    • The code changed in the five functions uses the new built-in function clear (Go 1.21) to set the obsolete elements to the zero value of the element type.
first, second, third, fourth := 11, 22, 33, 44
s := []*int{&first, &second, &third, &fourth}

if len(s) >= 4 {
  s = slices.Delete(s, 2, 3)
  fmt.Println("New length is", len(s))
}

for _, v := range s {
  fmt.Println(*v)
}

// New length is 3
// 11
// 22
// 44
first, second, third, fourth := 11, 22, 33, 44
s := []*int{&first, &second, &third, &fourth}

if len(s) >= 4 {
  s := slices.Delete(s, 2, 3)
  fmt.Println("New length is", len(s))
}

for _, v := range s {
  fmt.Println(*v)
}

// New length is 3
// 11
// 22
// 44
// panic: runtime error: invalid memory address or nil pointer dereference

  • Go 1.22 Release Notes
    • 春节前~
    • Functions that shrink the size of a slice (Delete, DeleteFunc, Compact, CompactFunc, and Replace) now zero the elements between the new length and the old length.
type Item struct {
  Name   string
  Amount int
}

items := []*Item{
  {Name: "Car", Amount: 1},
  {Name: "Car", Amount: 1},
}

l1 := len(slices.CompactFunc(items, func(a *Item, b *Item) bool {
  return a.Name == b.Name
}))

l2 := len(slices.CompactFunc(items, func(a *Item, b *Item) bool {
  return a.Amount == b.Amount
}))

fmt.Println(l1, l2)

// Go 1.21:
// 1 1
// Go 1.22:
// panic: runtime error: invalid memory address or nil pointer dereference

这个 Case 因为我使用 ent, 比较容易出现 []*ent.Entity. 所以我切换到了 lo.UniqBy. 一方面是 slices 目前无法替代 lo; 另一方面是 lo 使用体验更加.


  • Arroyo: What is stateful stream processing?

    • In stream processing, statelessness also goes hand-in-hand with a property that I'll call map-only. This means that there are no operations that require reorganizing ("shuffling") or sorting data; only operators that are like "map" or "filter" (in SQL terms, SELECT and WHERE) are supported. In particular, GROUP BY, JOIN, and ORDER BY can't be implemented.
    • 10x faster sliding windows: how our Rust streaming engine beats Flink
    • Early stateful systems like Flink and ksqlDB were designed at a time when memory was expensive and networks were slow. They relied on embedded key-value stores like RocksDB in order to provide large, relatively fast storage. However, in practice many users rely on the in-memory backend due to the complexity of tuning RocksDB.
    • ... due to the complexity of tuning RocksDB. 哈哈哈, 蛮现实的~
    • While Flink supports storing TBs of state in RocksDB, in practice this proves operationally difficult because of the need to load all of the state onto the processing nodes.
    • Newer systems like Rising Wave and Arroyo have adopted remote state backends that allow only live data to be loaded onto the processing nodes which enables much faster operations at large state sizes.
  • Arroyo 0.9

    • User-defined functions (UDFs) and user-defined aggregate functions (UDAFs) allow you to extend Arroyo with custom logic. New in Arroyo 0.9 is support for what we call async UDFs.
pub async fn get_city(ip: String) -> Option<String> {
  let body: serde_json::Value =
    reqwest::get(format!("http://geoip-service:8000/{ip}"))
      .await
      .ok()?
      .json()
      .await
      .ok()?;

  body.pointer("/names/en")
    .and_then(|t| t.as_str())
    .map(|t| t.to_string())
}
create view cities as
select get_city(logs.ip) as city
from logs;

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY window
    ORDER BY count DESC
  ) as row_num
  FROM (SELECT count(*) as count,
    city,
    hop(interval '5 seconds', interval '15 minutes') as window
      FROM cities
      WHERE city IS NOT NULL
      group by city, window)
) WHERE row_num <= 5;

  • Go Wiki: Rangefunc Experiment

    • GOEXPERIMENT=rangefunc
  • Previously, the variables declared by a for loop were created once and updated by each iteration.

    • In Go 1.22, each iteration of the loop creates new variables, to avoid accidental sharing bugs.
values := []int{1, 2, 3, 4, 5}
for _, v := range values {
  go func() {
    // go <= 1.21
    // vet: loop variable val captured by func literal
    // 5 5 5 5 5
    // go >= 1.22
    // 2 1 4 5 3 (randomly)
    fmt.Printf("%d ", v)
  }()
}