C语言处理英文字符(C language processing English characters)

前面我们多次提到了字符串,字符串是多个字符的集合,它们由” “包围,例如”http://www.baidu.com”、”岳麓书院”。字符串中的字符在内存中按照次序、紧挨着排列,整个字符串占用一块连续的内存

当然,字符串也可以只包含一个字符,例如”A”、”6″;不过为了操作方便,我们一般使用专门的字符类型来处理。

初学者经常用到的字符类型是 char它的长度是 1,只能容纳 ASCII 码表中的字符,也就是英文字符

要想处理汉语、日语、韩语等英文之外的字符,就得使用其他的字符类型,char 是做不到的,我们将在下节《C语言处理中文字符》中详细讲解。

字符的表示

  • 字符类型由单引号’ ‘包围,
  • 字符串由双引号” “包围。

下面的例子演示了如何给 char 类型的变量赋值:

//正确的写法
char a = '1';
char b = '$';
char c = 'X';
char d = ' '; // 空格也是一个字符
//错误的写法
char x = '中'; //char 类型不能包含 ASCII 编码之外的字符
char y = 'A'; //A 是一个全角字符
char z = "t"; //字符类型应该由单引号包围

说明:在字符集中,全角字符和半角字符对应的编号(或者说编码值)不同,是两个字符;ASCII 编码只定义了半角字符,没有定义全角字符。

字符的输出

输出 char 类型的字符有两种方法,分别是:

  • 使用专门的字符输出函数 putchar;
  • 使用通用的格式化输出函数 printf,char 对应的格式控制符是%c。

请看下面的演示:

  • #include
  • int main() {
  • char a = ‘1’;
  • char b = ‘$’;
  • char c = ‘X’;
  • char d = ‘ ‘;
  • //使用 putchar 输出
  • putchar(a); putchar(d);
  • putchar(b); putchar(d);
  • putchar(c); putchar(‘\n’);
  • //使用 printf 输出
  • printf(“%c %c %c\n”, a, b, c);
  • return 0;
  • }

运行结果:

putchar 函数每次只能输出一个字符,输出多个字符需要调用多次。

字符与整数

我们知道,计算机在存储字符时并不是真的要存储字符实体,而是存储该字符在字符集中的编号(也可以叫编码值)。对于 char 类型来说,它实际上存储的就是字符的 ASCII 码。

无论在哪个字符集中,字符编号都是一个整数;从这个角度考虑,字符类型和整数类型本质上没有什么区别。

我们可以给字符类型赋值一个整数,或者以整数的形式输出字符类型。反过来,也可以给整数类型赋值一个字符,或者以字符的形式输出整数类型。

请看下面的例子:

  • #include
  • int main()
  • {
  • char a = ‘E’;
  • char b = 70;
  • int c = 71;
  • int d = ‘H’;
  • printf(“a: %c, %d\n”, a, a);
  • printf(“b: %c, %d\n”, b, b);
  • printf(“c: %c, %d\n”, c, c);
  • printf(“d: %c, %d\n”, d, d);
  • return 0;
  • }

输出结果:

在 ASCII 码表中,字符 ‘E’、’F’、’G’、’H’ 对应的编号分别是 69、70、71、72。

a、b、c、d 实际上存储的都是整数:

  • 当给 a、d 赋值一个字符时,字符会先转换成 ASCII 码再存储;
  • 当给 b、c 赋值一个整数时,不需要任何转换,直接存储就可以;
  • 当以 %c 输出 a、b、c、d 时,会根据 ASCII 码表将整数转换成对应的字符;
  • 当以 %d 输出 a、b、c、d 时,不需要任何转换,直接输出就可以。

可以说,是 ASCII 码表将英文字符和整数关联了起来。

再谈字符串

前面我们讲到了字符串的概念,也讲到了字符串的输出,但是还没有讲如何用变量存储一个字符串。其实在C语言中没有专门的字符串类型,我们只能使用数组或者指针来间接地存储字符串。

在这里讲字符串很矛盾,虽然我们暂时还没有学到数组和指针,无法从原理上深入分析,但是字符串是常用的,又不得不说一下。所以本节我不会讲解太多,大家只需要死记硬背下面的两种表示形式即可:

  • char str1[] = “http://www.cdsy.xyz”;
  • char *str2 = “城东书院”;

str1 和 str2 是字符串的名字,后边的[ ]和前边的*是固定的写法。初学者暂时可以认为这两种存储方式是等价的,它们都可以通过专用的 puts 函数和通用的 printf 函数输出。

完整的字符串演示:

  • #include
  • int main()
  • {
  • char web_url[] = “http://www.cdsy.xyz”;
  • char *web_name = “城东书院”;
  • puts(web_url);
  • puts(web_name);
  • printf(“%s\n%s\n”, web_url, web_name);
  • return 0;
  • }
————————

We mentioned string many times earlier. String is a collection of multiple characters surrounded by “”, for example“ http://www.baidu.com “,” Yuelu Academy “. < strong > the characters in the string are arranged in order and next to each other in the memory, and the whole string occupies a continuous memory < / strong >.

Of course, the string can also contain only one character, such as “a”, “6”; However, for the convenience of operation, we generally use special character types.

The < strong > character type often used by beginners is “char < / strong >, < strong > its length is 1. It can only accommodate the characters in the ASCII code table, that is, the English character < / strong >.

< strong > if you want to deal with characters other than Chinese, Japanese, Korean and other English characters, < strong > you have to use other character types. Char cannot do it < / strong >, which will be explained in detail in the next section “C language processing Chinese characters”.

Representation of characters

  • The character type is surrounded by single quotation marks’,
  • The string is surrounded by double quotes’ ‘.

The following example demonstrates how to assign a value to a variable of type char:

//正确的写法
char a = '1';
char b = '$';
char c = 'X';
char d = ' '; // 空格也是一个字符
//错误的写法
char x = '中'; //char 类型不能包含 ASCII 编码之外的字符
char y = 'A'; //A 是一个全角字符
char z = "t"; //字符类型应该由单引号包围

Note: in the character set, the numbers (or coding values) corresponding to full width characters and half width characters are different. They are two characters; ASCII encoding only defines half angle characters, not full angle characters.

Character output

There are two ways to output char characters:

  • Use the special character output function putchar;
  • Using the general format output function printf, the format controller corresponding to char is% C.

See the following Demo:

  • #include
  • int main() {
  • char a = ‘1’;
  • char b = ‘$’;
  • char c = ‘X’;
  • char d = ‘ ‘;
  • //使用 putchar 输出
  • putchar(a); putchar(d);
  • putchar(b); putchar(d);
  • putchar(c); putchar(‘\n’);
  • //Using printf output
  • printf(“%c %c %c\n”, a, b, c);
  • return 0;
  • }

Operation results:

The putchar function can only output one character at a time, and it needs to be called multiple times to output multiple characters.

Characters and integers

We know that when storing characters, the computer does not really want to store the character entity, but the number of the character in the character set (also known as the coded value). For the char type, what it actually stores is the ASCII code of the character.

No matter in which character set, the character number is an integer; From this point of view, there is essentially no difference between character type and integer type.

We can assign an integer to the character type, or output the character type as an integer. Conversely, you can assign a character to an integer type or output an integer type in the form of a character.

Take the following example:

  • #include
  • int main()
  • {
  • char a = ‘E’;
  • char b = 70;
  • int c = 71;
  • int d = ‘H’;
  • printf(“a: %c, %d\n”, a, a);
  • printf(“b: %c, %d\n”, b, b);
  • printf(“c: %c, %d\n”, c, c);
  • printf(“d: %c, %d\n”, d, d);
  • return 0;
  • }

Output result:

In the ASCII code table, the corresponding numbers of characters’ e ‘,’ f ‘,’ g ‘and’ H ‘are 69, 70, 71 and 72 respectively.

a. B, C and D actually store integers:

  • When a character is assigned to a and D, the character will be converted into ASCII code and then stored;
  • When assigning an integer to B and C, it can be stored directly without any conversion;
  • When outputting a, B, C and D in% C, integers will be converted into corresponding characters according to ASCII code table;
  • When outputting a, B, C and D with% D, no conversion is required, and it can be output directly.

It can be said that ASCII code table associates English characters with integers.

On string

We talked about the concept of string and the output of string, but we haven’t talked about how to store a string with variables. In fact, there is no special string type in C language. We can only use arrays or pointers to store strings indirectly.

It’s contradictory to talk about strings here. Although we haven’t learned arrays and pointers for the time being, we can’t analyze them in depth in principle, but strings are commonly used, so we have to say it again. So I won’t explain too much in this section. You just need to memorize the following two expressions:

  • char str1[] = “http://www.cdsy.xyz”;
  • Char * STR2 = “Chengdong academy”;

STR1 and STR2 are the names of strings. The following [] and the front * are fixed. Beginners can temporarily think that these two storage methods are equivalent. They can be output through the special put function and the general printf function.

Full string presentation:

  • #include
  • int main()
  • {
  • char web_url[] = “http://www.cdsy.xyz”;
  • char *web_name = “城东书院”;
  • puts(web_url);
  • puts(web_name);
  • printf(“%s\n%s\n”, web_url, web_name);
  • return 0;
  • }