The table below lists some symptoms and solutions for troubleshooting purposes.
Symptom | Solution |
---|---|
If you use rsync for state transfer and a node crashes before the state transfer is over, the rsync process may hang forever, occupying the port and not allowing to restart the node. The problem will show up as port in use in the server error log. | Find the orphan rsync process and kill it manually. |
If you use mysqldump for state transfer, and it fails, an SQL SYNTAX error is written in the server error log. This error is only an indication of the error. The pseudo-statement within the SQL SYNTAX error contains the actual error message. | Read the pseudo-statement within the SQL SYNTAX resynchronizes with the primary component. |
After a temporary split, if the Primary Component was still reachable and its state was modified, resynchronization occurs. In resynchronization, nodes on the other part of the cluster drop all client connections. The connections get the Unknown command error. | This situation will be cleared after the node automatically while. |
Every query returns “Unknown command”. This phenomenon takes place if you have explicitly specified the wsrep_provider variable, but the wsrep provider rejects service, for example, because the node is not connected to the cluster Primary Component (the wsrep_cluster_address parameter may be unset, or there can be networking issues). In this case, the node is considered to be unsynced with the global state and unable to serve SQL requests except SET and/or SHOW. |
You can bypass the wsrep_provider check by switching the wsrep service off by using the command: mysql> SET wsrep_on=0; This command instructs mysqld to ignore the wsrep_provider setting and to behave as a standalone MySQL server. This may lead to data inconsistency with the rest of the cluster, which, on the other hand, may be a desirable result for, for example, modifying “local” tables. If you know that no other nodes of your cluster form Primary Component, rebootstrap the Primary Component as follows:
The component this node is part of will become a Primary Component, and all nodes in it will synchronize to the most up-to-date one and start accepting SQL requests again. |
Users (name, host, password) changes are not replicated to the cluster. | You have tried to update the mysql.user table directly. Use the GRANT command. Currently, replication only works with the InnoDB storage engine. Any writes to tables of other types, including system (mysql.*) tables, are not replicated. However, DDL statements are replicated on statement level, and changes to mysql.* tables will get replicated that way. You can safely issue commands such as CREATE USER... or or GRANT..., but issuing commands such as INSERT INTO mysql.user... will not be replicated. As a rule, non-transactional engines cannot be supported in multi-master replication. |
Cluster stalls when running the ALTER command on an unused table. | This is a side effect of a multi-master and several appliers scenario. The system needs to control when the DDL ends in relation to other transactions in order to deterministically detect conflicts and schedule parallel appliers. Effectively, the DDL commands must be executed in isolation. Galera Cluster for MySQL has a 65K window tolerance where transactions can be applied in parallel, but if an ALTER command takes too long, the cluster has to wait. You cannot help this situation. However, if you can guarantee that no other session will try to modify the table AND that there are no other DDLs running, you can:
Do this on each node in turn. |