# 性能优化思路

\[TOC]

## 内存优化

思路大致为：

1. 节省对象分配可以减少GC时扫描的对象数
2. 避免频繁创建销毁临时对象造成GC压力
3. 进行预分配，减少扩容次数
4. 尽可能分配一段连续且足够大的内存buffer进行数据处理
5. 通过一些内置函数或方法减少内存拷贝
6. 在必要场景下，刻意地进行逃逸分析，尽可能将对象分配在栈上

### 分配连续内存

当我们需要进行`[]*A`转换为`[]*B`操作时可以，先通过make(\[]B, len(A))的方式分配一段连续内存。

好处是：1. 内存是连续的，在循环查找时更快。2. 减少len(A)-1次内存分配。

```go
package main


// A ...
type A struct {
    A1 int32
    A2 int32
}

// B ...
type B struct {
    B1 int
    B2 int
}

func conv(sliceA []*A) []*B {
    var (
        tempSliceB = make([]B, len(sliceA))
        sliceB     = make([]*B, len(sliceA))
    )
    for i := 0; i < len(sliceA); i++ {
        tempSliceB[i].B1 = int(sliceA[i].A1)
        tempSliceB[i].B2 = int(sliceA[i].A2)
        sliceB[i] = &tempSliceB[i]
    }
    return sliceB
}

func main() {
    var sliceA = []*A{{A1: 0}, {A1: 1}, {A1: 2}}
    conv(sliceA)
}
```

### 内存对齐

#### 结构体字段合理排序

```go
package main

import (
    "fmt"
    "unsafe"
)

type A struct {
    a int32
    b int32
    c int64
}

type B struct {
    a int32
    c int64
    b int32
}

func main() {
    fmt.Printf("size of A is %d\n", unsafe.Sizeof(A{})) // size of A is 16
    fmt.Printf("size of B is %d\n", unsafe.Sizeof(B{})) // size of B is 24
}
```

#### 按类型聚合，比如map中按key1、key2、key3、value1、value2、value3连续紧密排列，减少不必要的填充

```go
// https://github.com/golang/go/blob/e9e0d1ef704c4bba3927522be86937164a61100c/src/runtime/map.go#L150-L150
// A bucket for a Go map.
type bmap struct {
  // tophash generally contains the top byte of the hash value
  // for each key in this bucket. If tophash[0] < minTopHash,
  // tophash[0] is a bucket evacuation state instead.
  tophash [bucketCnt]uint8
  // Followed by bucketCnt keys and then bucketCnt elems.
  // NOTE: packing all the keys together and then all the elems together makes the
  // code a bit more complicated than alternating key/elem/key/elem/... but it allows
  // us to eliminate padding which would be needed for, e.g., map[int64]int8.
  // Followed by an overflow pointer.
}
```

#### 通过显式填充避免 false sharing:

```go
// https://github.com/golang/go/blob/e9e0d1ef704c4bba3927522be86937164a61100c/src/sync/pool.go#L70-L70
type poolLocal struct {
  poolLocalInternal

  // Prevents false sharing on widespread platforms with
  // 128 mod (cache line size) = 0 .
  pad [128 - unsafe.Sizeof(poolLocalInternal{})%128]byte
}
```

### 合理地减少对象数：

#### 小对象结构体合并

类似于如下的组合模式，对于小对象B组合到对象A使用不需要使用指针，并且当我们new(A)时只需要进行一次对象创建。**可以节省对象数量，从而减少GC时扫描的对象数**

```go
package main

type A struct {
    a1 int32
    a2 int32
    B
}

type B struct {
    b1 int32
    b2 int32
}
```

#### 有策略地进行字符串拼接

[Efficient String Concatenation in Go](https://hermanschaaf.com/efficient-string-concatenation-in-go/)

直接通过`+`进行字符串拼接时会额外创建临时对象【在元素在5个以内时，性能比较好】

使用`strings.Join()`可以减少临时对象的创建，但是有构造字符串切片的开销【给定字符串切片进行拼接，使用strings.Join()性能较好】

使用`strings.Builder`或者`bytes.Buffer`通过创建一个缓存区来进行字符串拼接【元素大于5个时，性能比较好】

### 提前进行边界检查

```go
// binary.BigEndian.PutUint64()
// https://github.com/golang/go/blob/e9e0d1ef704c4bba3927522be86937164a61100c/src/encoding/binary/binary.go#L77-L77
func (littleEndian) Uint64(b []byte) uint64 {
  _ = b[7] // bounds check hint to compiler; see golang.org/issue/14808
  return uint64(b[0]) | uint64(b[1])<<8 | uint64(b[2])<<16 | uint64(b[3])<<24 | uint64(b[4])<<32 | uint64(b[5])<<40 
| uint64(b[6])<<48 | uint64(b[7])<<56
}
```

基准测试

```go
//BenchmarkBoundLow
//BenchmarkBoundLow-12        868246250             1.34 ns/op           0 B/op           0 allocs/op
//BenchmarkBoundTop
//BenchmarkBoundTop-12        1000000000             0.511 ns/op           0 B/op           0 allocs/op

package demo

import (
    "testing"
)

var list = []int64{0, 1, 2, 3, 4, 5, 6, 7, 8}

func BenchmarkBoundLow(b *testing.B) {
    b.ReportAllocs()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _ = list[0]
        _ = list[1]
        _ = list[2]
        _ = list[3]
        _ = list[5]
        _ = list[6]
        _ = list[7]
        _ = list[8]
    }
}

func BenchmarkBoundTop(b *testing.B) {
    b.ReportAllocs()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _ = list[8]
        _ = list[7]
        _ = list[6]
        _ = list[5]
        _ = list[4]
        _ = list[3]
        _ = list[2]
        _ = list[1]
        _ = list[0]
    }
}
```

### 避免频繁创建临时对象

#### 使用sync.Pool缓存

#### 减少长调用栈

goroutine的调用栈默认大小是2K（1.7版本后），它采用连续栈机制，当栈空间不够时，Go runtime会不停扩容：

* 当栈空间不够时，按2倍增加，原有栈的变量崆直接copy到新的栈空间，变量指针指向新的空间地址；
* 退栈会释放栈空间的占用，GC时发现栈空间占用不到1/4时，则栈空间减少一半。

比如栈的最终大小2M，则极端情况下，就会有10次的扩栈操作，这会带来性能下降。

建议：

* 控制调用栈和函数的复杂度，不要在一个goroutine做完所有逻辑；
* 如查的确需要长调用栈，而考虑**goroutine池化**，避免频繁创建goroutine带来栈空间的变化。

### 预估容量，减少扩容次数

#### bytes.Buffer

会分配一段连续的内存，在使用的时候可以设置一个足够大的数。

需要刻意阅读下源码实现，确认在buffer容量不足的时候是否会触发grow导致二次分配对象以及内存拷贝。

#### slice、map预分配

### 减少不必要的memory copy

比如使用io.Copy等操作进行数据拷贝，而不是额外再开辟buffer进行中转

比如使用`Readv`、`Writev`将非连续内存一次读、写，减少buffer合并中转

### 对象逃逸分析

[逃逸分析](https://1005281342.gitbook.io/gofun/tao-yi-fen-xi/tao-yi-fen-xi)

## 并发优化

[GO性能优化小结](https://johng.cn/go-optimize-brief/)

### 高并发的任务处理使用goroutine池

### 避免高并发调用同步系统接口

### 高并发时减少或避免共享对象互斥粒度

## 内联优化

Go 编译器会在编译期自动把适合条件的函数内联到调用函数中，以减少函数调用返回时参数传递入栈出栈等性能耗损。

当被调用的函数很长时，可以进行拆分，以使部分比较常命中的逻辑分支内联到调用函数中。

比如 sync.Once 里面的的这段代码

```go
// https://github.com/golang/go/blob/e9e0d1ef704c4bba3927522be86937164a61100c/src/sync/once.go#L58-L58
func (o *Once) Do(f func ()) {
  // Note: Here is an incorrect implementation of Do:
  //
  //    if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
  //        f()
  //    }
  //
  // Do guarantees that when it returns, f has finished.
  // This implementation would not implement that guarantee:
  // given two simultaneous calls, the winner of the cas would
  // call f, and the second would return immediately, without
  // waiting for the first's call to f to complete.
  // This is why the slow path falls back to a mutex, and why
  // the atomic.StoreUint32 must be delayed until after f returns.

  if atomic.LoadUint32(&o.done) == 0 {
  // Outlined slow-path to allow inlining of the fast-path.Carlo Alberto Ferraris, 3 years ago: • sync: allow inlining the Once.Do fast path
  o.doSlow(f)
  }
}

func (o *Once) doSlow(f func ()) {
  o.m.Lock()
  defer o.m.Unlock()
  if o.done == 0 {
  defer atomic.StoreUint32(&o.done, 1)
  f()
  }
}
```

## 使用位运算代替分支跳转

[分支预测器](https://en.wikipedia.org/wiki/Branch_predictor)

[Parsing JSON Really Quickly: Lessons Learned](https://www.youtube.com/watch?v=wlvKAT7SZIQ)

## reference

[编写和优化Go代码](https://github.com/dgryski/go-perfbook/blob/master/performance-zh.md)

[Go 语言性能优化](https://cch123.github.io/perf_opt/)

[GO性能优化小结](https://johng.cn/go-optimize-brief/)

[Efficient String Concatenation in Go](https://hermanschaaf.com/efficient-string-concatenation-in-go/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://1005281342.gitbook.io/gofun/xing-neng-fen-xi/xing-neng-you-hua.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
