Easton Man's Channel

@EastonMan 看的新闻
+碎碎念
+膜大佬
+偶尔猫猫
+伊斯通听的歌

03:51 · Apr 8, 2025 · Tue

Daniel Lemire's blog
How helpful is AI?

Do large language models (AI) make you 3x faster or only 3% faster? The answer depends on the quality of the work you are producing.
If you need something like a stock photo but not much beyond that, AI can make you 10x faster.
If you need a picture taken at the right time of the right person, AI doesn’t help you much.
If you need a piece of software that an intern could have written, AI can do it 10x faster than the intern.
If you need a piece of software that only 10 engineers in the country can understand, AI doesn’t help you much.
The effect is predictable: finding work if your skill level is low becomes more difficult. However, if you are a highly skilled individual, you can eliminate much of the boilerplate work and focus on what matters. Thus, elite people are going to become even more productive.

source

04:36 · Apr 7, 2025 · Mon

Daniel Lemire's blog
Faster shuffling in Go with batching

Telegraph | source

Telegraph

Faster shuffling in Go with batching

import "math/rand/v2" func shuffleStandard(data []uint64) { rand.Shuffle(len(data), func(i, j int) { data[i], data[j] = data[j], data[i] }) }

01:43 · Apr 6, 2025 · Sun

Chips and Cheese
Dynamic Register Allocation on AMD's RDNA 4 GPU Architecture
#ChipAndCheese

Telegraph | source
(author: Chester Lam)

Telegraph

Dynamic Register Allocation on AMD's RDNA 4 GPU Architecture

Modern GPUs often make a difficult tradeoff between occupancy (active thread count) and register count available to each thread. Higher occupancy provides more thread level parallelism to hide latency with, just as more SMT threads help hide latency on a…

ChipAndCheese

10:21 · Apr 5, 2025 · Sat

https://xxxuuu.me/post/deterministic-simulator

确定性模拟器 | x³u³

最初接触到确定性模拟的概念是在 2022 年 Rust China Conf 上听的一场演讲，后续一直持续关注着这个领域，也在腾讯组内分享过相关议题

11:41 · Apr 2, 2025 · Wed

https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html

OpenSSF Best Practices Working Group

Compiler Options Hardening Guide for C and C++

The Best Practices for OSS Developers working group is dedicated to raising awareness and education of secure code best practices for open source developers.

18:52 · Apr 1, 2025 · Tue

Chips and Cheese
Inside Nvidia's GeForce 6000 Series
#ChipAndCheese

Telegraph | source
(author: Chester Lam)

Telegraph

Inside Nvidia's GeForce 6000 Series

2025 has kicked off with a flurry of GPU activity. Intel's Arc B580 revealed that it's still possible to make a mid-range GPU with more than 8 GB of VRAM. AMD's RDNA 4 marked the continuation of a longstanding AMD practice where they reach for the top-end…

ChipAndCheese

12:53 · Mar 30, 2025 · Sun

Et Tu, Grammarly? https://dbushell.com/2025/03/29/et-tu-grammarly/

dbushell.com

Et tu, Grammarly?

The one where I deploy counter defences

10:02 · Mar 29, 2025 · Sat

Daniel Lemire's blog
Mixing ARM NEON with SVE code for fun and profit

Telegraph | source

Telegraph

Mixing ARM NEON with SVE code for fun and profit

Most mobile devices use 64-bit ARM processors. A growing number of servers (Amazon, Microsoft) also use 64-bit ARM processors. These processors have special instructions called ARM NEON providing parallelism called Single instruction, multiple data (SIMD).…

08:21 · Mar 28, 2025 · Fri

Chips and Cheese
An Interview with Oxide's Bryan Cantrill
#ChipAndCheese

Telegraph | source
(author: George Cozma)

Telegraph

An Interview with Oxide's Bryan Cantrill

Hello you fine Internet folks, Today we have an interview with Bryan Cantrill from Oxide Computer Company. Cloud computing has been a tour de force in the computing industry, with many businesses and even governments moving over to cloud services for their…

ChipAndCheese

00:55 · Mar 26, 2025 · Wed

Matt Keeter
The Prospero Challenge

Evaluate a 7866-clause math expression for fame and glory

source
(author: Matt Keeter ([email protected]))

07:33 · Mar 25, 2025 · Tue

Daniel Lemire's blog
Unsigned comparisons using signed types

Telegraph | source

Telegraph

Unsigned comparisons using signed types

There are two main types of fixed-precision integers in modern software: unsigned and signed. In C++20 and above, the signed integers must use the two’s complement convention. Other programming languages typically specify two’s complement as well. Two’s complement…

23:57 · Mar 24, 2025 · Mon

属于CYY自己的世界
在 Ubuntu 22.04 的阿里云 ECS 上将 rootfs 转换为 btrfs

# ssh 到服务器
sudo su
cd /boot
wget http://mirrors.cqu.edu.cn/debian/dists/stable/main/installer-amd64/current/images/netboot/debian-installer/amd64/initrd.gz
wget http://mirrors.cqu.edu.cn/debian/dists/stable/main/installer-amd64/current/images/netboot/debian-installer/amd64/linux
sed -i 's/GRUB_TIMEOUT=5/GRUB_DEFAULT=5/g' /etc/default/grub
sed -i 's/GRUB_TIMEOUT_STYLE=hidden/GRUB_TIMEOUT_STYLE=menu/g' /etc/default/grub
update-grub
# 阿里云后台打开vnc控制台
reboot
# 重启，在grub启动菜单处，按e，linux linux; initrd initrd.gz，F10
# 一直完成到设置区域，镜像站（国内服务器记得选国内镜像站），root用户及密码
# 在 Partition disk 处停下，选go back，然后点execute a shell
cat /proc/partitions
# 确定仍然为/dev/vda3
wget https://mirrors.tnonline.net/btrfs/btrfs-progs/x86_64/btrfs-progs-6.9.2-x86_64-static/btrfs-progs-6.9.2-x86_64-static.tar.gz
gunzip btrfs-progs-6.9.2-x86_64-static.tar.gz
tar -xvf btrfs-progs-6.9.2-x86_64-static.tar
fsck.ext4 /dev/vda3 -f
blkid
# 记下来 /dev/vda3 的 UUID （非常重要）
# UUID="a9699f99-5614-4444-be92-d2ef6cfdbaf6"
./btrfs-convert.static /dev/vda3
./btrfstune.static -U a9699f99-5614-4444-be92-d2ef6cfdbaf6 /dev/vda3
reboot -f
开机后重新执行 sudo update-grub

source
(author: Yangyu Chen)

06:07 · Mar 24, 2025 · Mon

Chips and Cheese
RDNA 4's "Out-of-Order" Memory Accesses
#ChipAndCheese

Telegraph | source
(author: Chester Lam)

Telegraph

RDNA 4's "Out-of-Order" Memory Accesses

AMD's RDNA 4 brings a variety of memory subsystem enhancements. Among those, one slide stood out because it dealt with out-of-order memory accesses. According to the slide, RDNA 4 allows requests from different shaders to be satisfied out-of-order, and adds…

ChipAndCheese

12:50 · Mar 22, 2025 · Sat

Use Long Options in Scripts https://matklad.github.io/2025/03/21/use-long-options-in-scripts.html

matklad.github.io

Use Long Options in Scripts

Many command line utilities support short form options (-f) and long form options (--force).
Short form is for interactive usage. In scripts, use the long form.

21:26 · Mar 21, 2025 · Fri

有没有人想玩一些奇妙的DNS的
~~https://dash.xns.one/invite/127864de-70a1-4ac4-a2d7-28fffc8d6b79~~
~~https://dash.xns.one/invite/e301fbda-ac6f-4270-ac03-9d349baff4ab~~
~~https://dash.xns.one/invite/ae4c82e5-07b0-406a-a1f0-13a1c47e3751~~
~~https://dash.xns.one/invite/b1a18377-e25f-4e29-9556-12189da6c91f~~

04:42 · Mar 20, 2025 · Thu

Chips and Cheese
Looking Ahead at Intel’s Xe3 GPU Architecture
#ChipAndCheese

Telegraph | source
(author: Chester Lam)

Telegraph

Looking Ahead at Intel’s Xe3 GPU Architecture

Intel’s foray into high performance graphics has enjoyed impressive progress over the past few years, and the company is not letting up on the gas. Tom Peterson from Intel has indicated that Xe3 hardware design is complete, and software work is underway.…

ChipAndCheese

23:36 · Mar 17, 2025 · Mon

uuuu

23:36 · Mar 17, 2025 · Mon

#PL #Rust #OS #Linux | Ubuntu 从 25.10 开始将会使用 uutils 替代 GNU coreutils。
https://www.osnews.com/story/141908/ubuntu-to-replace-classic-coreutils-and-more-with-new-rust-based-alternatives/

PL Rust OS Linux

22:49 · Mar 17, 2025 · Mon

Harry Chen’s Blog
又踩了 CMap 的坑——探究字体与 PDF 文件中的字符映射表

Telegraph | source
(author: Shengqi Chen ([email protected]))

Telegraph

又踩了 CMap 的坑——探究字体与 PDF 文件中的字符映射表

今天是研究生毕业论文提交初稿的日子（当然和我没什么关系）。中午有组里的同学来找我，说 GPT 老师找出了如下的问题：（英文关键词部分）所有英文分号”;”显示异常（显示为希腊问号字符U+037E），应统一改为标准英文分号”;”。打开 thuthesis 生成的 PDF 一看，确实是这样。我感觉有点奇怪，就打开 thuthesis 的仓库看了一眼，对应代码是这样写的： \thu@clist@use{\thu@keywords@en}{; }% 看起来完全没有问题。那是怎么回事呢？ TL;DR: 是 PDF…

01:32 · Mar 16, 2025 · Sun

Daniel Lemire's blog
Speeding up C++ code with template lambdas

Let us consider a simple C++ function which divides all values in a range of integers:

void divide(std::span<int> i, int d) {
    for (auto& value : i) {
        value /= d;
    }
}

If the divisor d is known at compile-time, this function can be much faster. E.g., if d is 2, the compiler might optimize away the division and use a shift and a few cheap instructions instead. The same is true with all compile-time constant: the compiler can often do better knowing the constant.

In C++, a template function is defined using the template keyword followed by a parameter (usually a type parameter) enclosed in angle brackets < >. The template parameter acts as a placeholder that gets replaced with actual data type when the function is called.

In C++, you can turn the division parameter into a template parameter:

template <int d>
void divide(std::span<int> i) {
    for (auto& value : i) {
        value /= d;
    }
}

The template function is not itself a function, but rather a recipe to generate functions: we provide the integer d and a function is created. This allows the compiler to work with a compile-time constant, producing faster code.

If you expect the divisor to be between 2 and 6, you can call the template function from a general-purpose function like so:

void divide_fast(std::span<int> i, int d) {
    if(d == 2) {
        return divide<2>(i);
    }
    if(d == 3) {
        return divide<3>(i);
    }
    if(d == 4) {
        return divide<4>(i);
    }
    if(d == 5) {
        return divide<5>(i);
    }
    if(d == 6) {
        return divide<6>(i);
    }

    for (auto& value : i) {
        value /= d;
    }
}

You could do it with a switch/case if you prefer but it does not simplify the code significantly.

Unfortunately we have to expose a template function, which creates noise in our code base. We would prefer to keep all the logic inside one function. We can do so with lambda functions.
In C++, a lambda function(or lambda expression) is an anonymous, inline function that you can define on-the-fly, typically for short-term use. Starting with C++20, you have template lambda expressions.
We can almost do it like so:

void divide_fast(std::span<int> i, int d) {
    auto f = [&i]<int divisor>() {
      for (auto& value : i) {
        value /= divisor;
      }
    };
    if(d == 2) {
        return f<2>();
    }
    if(d == 3) {
        return f<3>();
    }
    if(d == 4) {
        return f<4>();
    }
    if(d == 5) {
        return f<5>();
    }
    if(d == 6) {
        return f<6>();
    }

    for (auto& value : i) {
        value /= d;
    }
}

Unfortunately, it does not quite work. Given template lambda expressions, you cannot directly pass template parameters, and you need something ugly (‘template operator()&LTparams>’):

void divide_fast(std::span<int> i, int d) {
    auto f = [&i]<int divisor>() {
      for (auto& value : i) {
        value /= divisor;
      }
    };
    if(d == 2) {
        return f.template operator()<2>();
    }
    if(d == 3) {
        return f.template operator()<3>();
    }
    if(d == 4) {
        return f.template operator()<4>();
    }
    if(d == 5) {
        return f.template operator()<5>();
    }
    if(d == 6) {
        return f.template operator()<6>();
    }

    for (auto& value : i) {
        value /= d;
    }
}

In practice, it might still be a good choice. It keeps all the messy optimization hidden inside your function.

source

Before

After