单进程容器 && 无状态容器(Single process container & & stateless container)

转载自 mafeifan 的技术博客-Docker常见问题

为什么使用单进程容器

使用容器时,要尽量使用单进程容器,所谓单进程容器,是指在容器运行时,只有一个工作进程。

如果需要存在多个进程协作的时候,要部署为两个容器,比如 PHP 一个容器,MySQL 一个容器,而不要在一个容器中运行这两者。

因为,Docker本身就是一个非常好的守护进程,它可以完美地管理一个进程,但是如果一个容器中存在多个进程时,你就需要自己维护两个进程的运行状态,比如使用 supervisord ,但这就大大增加了容器维护的难度和不稳定性。

比如在一个容器中同时运行 PHP 和 MySQL,那么如果PHP异常退出了,容器该不该连同MySQL一起退出?如果不退出,而是不断重启PHP,那么在容器之外,比如运行 docker ps 是无法了解到PHP运行状态的。

所以,使用docker,就要习惯于单进程容器的方式,既简单,又稳健。

为什么使用无状态容器

所谓状态,是指程序在执行过程中生成的中间数据,而无状态容器,是指容器在运行时,不在容器中保存任何数据,而将数据统一保存在容器外部,比如数据库中。

因为有状态的容器异常重启就会造成数据丢失,也无法多副本部署,无法实现负载均衡。

比如PHP的Session数据默认存储在磁盘上,比如 /tmp 目录,而多副本负载均衡时,多个PHP容器的目录是彼此隔离的。比如存在两个副本A和B,用户第一次请求时候,流量被转发到A,并生成了SESSION,而第二次请求时,流量可能被负载均衡器转发到B上,而B是没有SESSION数据的,所以就会造成会话超时等BUG。

如果采用主机卷的方式,多个容器挂载同一个主机目录,就可以共享SESSION数据,但是如果多主机负载均衡场景,就需要将SESSION存储于外部数据库或Redis中了。

除了文件,还有内存数据,比如Node.js项目中使用了全局变量暂存数据,那么这个容器也是有状态的,也会出现类似BUG,所以要使用无状态容器。

转载自 Docker不适合部署数据库的7大原因

Docker不适合部署数据库的几大原因

个人对部分问题是否真的存在存疑,仅做参考

1、数据安全问题

不要将数据储存在容器中,这也是 Docker 官方容器使用技巧中的一条。容器随时可以停止、或者删除。当容器被rm掉,容器里的数据将会丢失。为了避免数据丢失,用户可以使用数据卷挂载来存储数据。
但是容器的 Volumes 设计是围绕 Union FS 镜像层提供持久存储,数据安全缺乏保证。如果容器突然崩溃,数据库未正常关闭,可能会损坏数据。

2、性能问题

大家都知道,MySQL 属于关系型数据库,对IO要求较高。当一台物理机跑多个时,IO就会累加,导致IO瓶颈,大大降低 MySQL 的读写性能。
在一次Docker应用的十大难点专场上,某国有银行的一位架构师也曾提出过:“数据库的性能瓶颈一般出现在IO上面,如果按 Docker 的思路,那么多个docker最终IO请求又会出现在存储上面。现在互联网的数据库多是share nothing的架构,可能这也是不考虑迁移到 Docker 的一个因素吧”。

针对性能问题有些同学可能也有相对应的方案来解决:

(1)数据库程序与数据分离
  如果使用Docker 跑 MySQL,数据库程序与数据需要进行分离,将数据存放到共享存储,程序放到容器里。如果容器有异常或 MySQL 服务异常,自动启动一个全新的容器。另外,建议不要把数据存放到宿主机里,宿主机和容器共享卷组,对宿主机损坏的影响比较大。

(2)跑轻量级或分布式数据库
  Docker 里部署轻量级或分布式数据库,Docker 本身就推荐服务挂掉,自动启动新容器,而不是继续重启容器服务。

(3)合理布局应用
  对于IO要求比较高的应用或者服务,将数据库部署在物理机或者KVM中比较合适。目前TX云的TDSQL和阿里的Oceanbase都是直接部署在物理机器,而非Docker 。

3、网络问题

要理解 Docker 网络,您必须对网络虚拟化有深入的了解。也必须准备应付好意外情况。
网络对于数据库复制是至关重要的,其中需要主从数据库间 24/7 的稳定连接。未解决的 Docker 网络问题在1.9版本依然没有得到解决。

4、状态

在 Docker 中水平伸缩只能用于无状态计算服务,而不是数据库。
Docker 快速扩展的一个重要特征就是无状态,具有数据状态的都不适合直接放在 Docker 里面,如果 Docker 中安装数据库,存储服务需要单独提供。和1类似。
目前,TX云的TDSQL(金融分布式数据库)和阿里云的Oceanbase(分布式数据库系统)都直接运行中在物理机器上,并非使用便于管理的 Docker 上。

5、资源隔离

资源隔离方面,Docker 确实不如虚拟机KVM,Docker是利用Cgroup实现资源限制的,只能限制资源消耗的最大值,而不能隔绝其他程序占用自己的资源。如果其他应用过渡占用物理机资源,将会影响容器里 MySQL 的读写效率。
我们没有看到任何针对数据库的隔离功能,那为什么我们应该把它放在容器中呢?

————————

Reprinted from mafeifan’s technology blog – docker FAQ

Why use a single process container

When using a container, try to use a single process container. The so-called single process container means that there is only one working process when the container is running.

If multiple processes need to cooperate, they should be deployed as two containers, such as PHP and MYSQL, instead of running the two in one container.

Because docker itself is a very good daemon, which can perfectly manage a process. However, if there are multiple processes in a container, you need to maintain the running state of the two processes yourself, such as using supervisor, but this greatly increases the difficulty and instability of container maintenance.

For example, if PHP and MySQL are running in a container at the same time, should the container exit together with MySQL if PHP exits abnormally? If you don’t quit, but keep restarting PHP, you can’t know the running status of PHP outside the container, such as running docker PS.

Therefore, when using docker, you should get used to the single process container, which is simple and robust.

Why use stateless containers

The so-called status refers to the intermediate data generated during the execution of the program. The < strong > stateless container refers to that when the container is running, it does not save any data in the container, but uniformly saves the data outside the container < / strong >, such as the database.

Because the abnormal restart of stateful containers will cause data loss, multi replica deployment and load balancing.

For example, the session data of PHP is stored on disk by default, such as the / tmp directory. During multi copy load balancing, the directories of multiple PHP containers are isolated from each other. For example, there are two copies a and B. when the user requests for the first time, the traffic is forwarded to a and generates a session. When the user requests for the second time, the traffic may be forwarded to B by the load balancer, but B has no session data, so it will cause bugs such as session timeout.

< strong > if the host volume method is adopted, multiple containers mount the same host directory < / strong >, the session data can be shared. However, if the multi host load balancing scenario is adopted, the session needs to be stored in an external database or redis.

In addition to files, there are also memory data, such as node JS project uses global variables to temporarily store data, so this container is also stateful, and similar bugs will appear, so stateless containers should be used.

Reprinted from 7 reasons why docker is not suitable for database deployment

Several reasons why docker is not suitable for database deployment

I have doubts about whether some problems really exist, just for reference

1. Data security issues

Do not store data in containers, which is also one of the official container usage tips of docker. The container can be stopped, or deleted at any time. When the container is RM dropped, the data in the container will be lost. To avoid data loss, users can use data volume mount to store data.
However, the volumes of the container is designed to provide persistent storage around the union FS image layer, and the data security is not guaranteed. If the container crashes suddenly and the database does not shut down normally, the data may be damaged.

2. Performance issues

As we all know, MySQL is a relational database with high IO requirements. When a physical machine runs more than one, IO will accumulate, resulting in io bottleneck, which greatly reduces the read and write performance of MySQL.
In a special session on the top ten difficulties of docker application, An architect of a state-owned bank once put forward: “the performance bottleneck of the database generally appears on Io. If you follow the idea of docker, multiple docker final IO requests will appear on storage. Now most Internet databases are share nothing architecture, which may also be a factor that does not consider migrating to docker”.

< strong > some students may also have corresponding solutions to solve performance problems: < / strong >

(1) Separation of database program and data
If docker is used to run mysql, the database program and data need to be separated, the data is stored in the shared storage, and the program is placed in the container. If the container has an exception or MySQL service exception, a new container will be started automatically. In addition, it is recommended not to store data in the host. The host and container share volume groups, which has a great impact on the damage of the host.

(2) Run lightweight or distributed databases
When a lightweight or distributed database is deployed in docker, docker itself recommends that the service hang up and automatically start a new container instead of restarting the container service.

(3) Rational layout and Application
For applications or services with high IO requirements, it is more appropriate to deploy the database in the physical machine or KVM. At present, tdsql of TX cloud and oceanbase of Alibaba are deployed directly on physical machines rather than dockers.

3. Network problems

To understand docker network, you must have a deep understanding of network virtualization. You must also be prepared to deal with unexpected situations.
Network is very important for database replication, which requires a 24 / 7 stable connection between master and slave databases. The unresolved docker network problem is still unresolved in version 1.9.

4. Status

In docker, horizontal scaling can only be used for stateless computing services, not databases.
An important feature of docker’s rapid expansion is stateless. Those with data status are not suitable to be placed directly in docker. If databases are installed in docker, storage services need to be provided separately. Similar to 1.
At present, both tdsql (Financial distributed database) of TX cloud and oceanbase (distributed database system) of Alibaba cloud are running directly on physical machines, not on dockers that are easy to manage.

5. Resource isolation

In terms of resource isolation, docker is indeed inferior to the virtual machine KVM. Docker uses CGroup to limit resources, which can only limit the maximum resource consumption, but can not isolate other programs from occupying their own resources. If other applications occupy physical machine resources excessively, the reading and writing efficiency of MySQL in the container will be affected.
We don’t see any isolation for the database, so why should we put it in a container?