字符串-《go语言底层原理剖析》读书笔记(String – reading notes of “analysis of the underlying principles of go language”)

字符串

  • 字符串一般有两种类型,一种在编译时指定长度,不能修改;一种具有动态长度,可以修改。
  • go语言中的字符串是不能修改的,只能被访问,不能使用索引对字符串内容进行修改。
  • 字符串的终止方式有两种,一种是c语言中的隐式申明,以字符”\0″作为终止符。一种是go语言中的显示申明。
  • go语言字符结构体,Data:指底层的字符数组,Len:代表字符串的长度。
  • 字符串本质上是一串字符数组,每个字符在存储时对应了一个或多个整数,设计字符集的编码方式。
  • go语言中所有文件都采用utf-8的编码方式,同时常量字符使用utf-8的字符集编码,字母占1个字节,中文占3个字节。
  • 符文类型:go语言设计者认为用字符标识字符串的组成元素可能会产生歧义,因为有些字符非常相似,例如小写字母a和带重音符号的a,它们在编码后的整数是不同的,所以go语言中使用符文(rune)类型来表示和区分字符串中的“字符”,rune其实是int32的别称。
  • 使用range轮询字符串时,轮询的不是单个字符,而是具体的rune,range返回的两个参数:index,value;其中index代表每个rune的字节偏移量,value为int32,代表符文数。
  • 字符常量存储于静态存储区,其内容不可以被改变,申明时有单引号和双引号两种方法。
  • 字符常量的拼接发生在编译时,而字符串常量的拼接发生在运行时。拼接后的字符串小于32字节时,会有一个临时的缓存供其使用。当拼接的字符串大于32字节时,会请求在堆中分配内存。
  • 注意:字节数组与字符串的相互转换并不是无损的指针引用,涉及了复制,string和[]byte的直接转换是通过底层数据copy实现的,可以通过unsafe.Pointer(指针转换)和uintptr(指针运算)实现高效转换。
————————

character string

  • There are generally two types of strings. One is to specify the length during compilation and cannot be modified; One has dynamic length and can be modified.
  • The string in go language cannot be modified. It can only be accessed, and the index cannot be used to modify the string content.
  • There are two ways to terminate a string. One is the implicit declaration in C language, with the character “\ 0” as the terminator. One is the display declaration in go language.
  • Go language character structure, data: refers to the underlying character array, len: represents the length of the string.
  • A string is essentially an array of characters. Each character corresponds to one or more integers when stored. The coding method of the character set is designed.
  • All files in go language are encoded in UTF-8, and constant characters are encoded in UTF-8 character set, with letters accounting for 1 byte and Chinese accounting for 3 bytes.
  • Rune type: go language designers believe that identifying the constituent elements of a string with characters may cause ambiguity, because some characters are very similar, such as lowercase letter A and accented a, and their encoded integers are different. Therefore, Rune type is used in go language to represent and distinguish the “characters” in the string. Rune is actually a nickname of int32.
  • When using the range polling string, the polling is not a single character, but a specific run. The two parameters returned by range are index and value; Where index represents the byte offset of each rune, value is int32, and represents the number of runes.
  • Character constants are stored in static storage area, and their contents cannot be changed. There are two methods for declaration: single quotation mark and double quotation mark.
  • The splicing of character constants occurs at compile time, while the splicing of string constants occurs at run time. When the spliced string is less than 32 bytes, there will be a temporary cache for its use. When the spliced string is larger than 32 bytes, it will request to allocate memory in the heap.
  • Note: the mutual conversion between byte array and string is not a lossless pointer reference, which involves replication. The direct conversion of string and [] byte is realized through the underlying data copy. Efficient conversion can be realized through unsafe.pointer (pointer conversion) and uintptr (pointer operation).