Distributed System with Wordpress and MySQL using Virtual Box
Infrastructure
Summary
Introduction
Some months ago, I was requested to create a distributed system that looks like this:
The system consists of the following:
- A cluster for
high availability
with three database servers running MySQL which should be able toreplicate
the data between them.
- A cluster for
load balancing
with two servers runnning someweb server
like Nginx or Apache.
I’m using VirtualBox to create the virtual machines. You can download it from the following link. All the nodes or servers that I’m going to create are going to be based on the following image:
- AlmaLinux 8.4
- We also have the Bridged Adapter network configuration.
High Availability Cluster
We must generate three virtual machines with the following configuration:
-
Server One
- IP: x.x.x.100
- Hostname: SQL-ONE
-
Server Two
- IP: x.x.x.101
- Hostname: SQL-TWO
-
Server Three
- IP: x.x.x.102
- Hostname: SQL-THREE
If you want to change the IP address of a virtual machine, you can read the following post.
Disclaimer:
Thex.x.x
is the network address of your network. You can check it running the following commandip a
on your host machine.
Replication
You can read more about the configuration of a
replication cluster
in the following link.
- We must install and configure
mysql
andmysql-server
on all virtual machines - Once we have installed and configured the corresponding services, we can start with the configuration for a replication technique that will allow us to have a distributed system with high availability. We have two options:
I’m going to explain the Group Replication technique and you can read more about this in the following link.
- We have to edit the
/etc/my.cnf.d/mysql-server.cnf
file on all virtual machines with the following content
Variable | Description |
---|---|
skip_name_resolve | This variable is used to avoid DNS resolution. This is useful when you are using IP addresses instead of hostnames. |
server_id | This variable is used to identify each node in the cluster. |
disabled_storage_engines | This variable is used to disable the storage engines that you don't need. |
datadir | This variable is used to specify the directory where the data will be stored. |
socket | This variable is used to specify the socket file. |
log-error | This variable is used to specify the log file. |
pid-file | This variable is used to specify the pid file. |
gtid_mode | This variable is used to enable the GTID mode. |
enforce_gtid_consistency | This variable is used to enable the GTID consistency. |
plugin _load_add | This variable is used to load the group replication plugin that we are going to use for the Group replication |
group_replication_group_name | This variable is used to specify the group name. This must be a valid UUID (Universally Unique Identifier). |
group_replication_start_on_boot | This variable is used to specify if the group replication will start on the boot of the system. |
group_replication_local_address | This variable is used to specify the local address of the node. |
group_replication_group_seeds | This variable is used to specify the ip address of the other nodes. |
[mysqld]
skip_name_resolve=ON
server_id=1
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysql/mysqld.log
pid-file=/run/mysqld/mysqld.pid
gtid_mode=ON
enforce_gtid_consistency=ON
plugin_load_add='group_replication.so'
group_replication_group_name="5add9b42-775c-11ee-be2f-080027f362da"
group_replication_start_on_boot=off
group_replication_local_address= "x.x.x.79:33061"
group_replication_group_seeds= "x.x.x.79:33061,x.x.x.80:33061,x.x.x.81:33061"
And then restart the mysqld
service
systemctl restart mysqld
Note: See more information related to the variables used here
If you want to start the group replication on boot, you must change the value of the group_replication_start_on_boot
variable to ON
and then restart the mysqld
service.
But be careful, because if you shut down all the nodes and then start them, you must run the bootstrap command on the primary node to start the group replication.
- We have to create a user to check the replication on all virtual machines
mysql -u root -p
-- Disable the log
SET SQL_LOG_BIN=0;
CREATE USER 'repl'@'%' IDENTIFIED WITH mysql_native_password BY '1850ABc';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
GRANT CONNECTION_ADMIN ON *.* TO 'repl'@'%';
GRANT BACKUP_ADMIN ON *.* TO 'repl'@'%';
GRANT GROUP_REPLICATION_STREAM ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
-- Enable the log
SET SQL_LOG_BIN=1;
- When you have created the replication user, you must supply the user credentials to the server for use with distributed recovery. You can do this by setting the user credentials as the credentials for the group_replication_recovery channel. Run this command on all nodes
CHANGE REPLICATION SOURCE TO
SOURCE_USER='repl',
SOURCE_PASSWORD='1850ABc' FOR CHANNEL 'group_replication_recovery';
Read the documentation here
- Check the plugins for
group_replication
SHOW PLUGINS;
You must check that the group_replication
plugin is installed otherwise you must install it
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
Read the documentation here
- Bootstrap the group replication. Ensure that you run these commands on the primary node
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
- If you are providing the credentials for the group_replication_recovery channel (Mysql 8.0.23 or later)
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION USER='repl', PASSWORD='1850ABc';
SET GLOBAL group_replication_bootstrap_group=OFF;
If you shutdown all the servers, you must always run the bootstrap command on the primary node to start the group replication
- Check the status of the group replication
SELECT * FROM performance_schema.replication_group_members;
- To add the other two servers to the group replication, you must run the following command on the other nodes
START GROUP_REPLICATION;
Read de docs here
GENERAL NOTE: You may need to bind ip addresses
to have the mysql group replication working properly. You can achieve this editing the /etc/hosts
file on all nodes
x.x.x.x MACHINE-HOSTNAME
y.y.y.y MACHINE-HOSTNAME
z.z.z.z MACHINE-HOSTNAME
High Availability
You can read in more detail about the configuration of a
high availability cluster
in the following link.
Once we have the group replication working properly, is time to configure the high availability cluster
. I’m going to use the Keepalived
service to achieve this.
- Create a user to access the database
performance_schema
on the primary node and thanks to thereplication technique
the other nodes will have the same user.
This user will be used to check if the node is the primary node.
CREATE USER 'verifier'@'%' IDENTIFIED WITH mysql_native_password BY '1850ABc';
GRANT SELECT ON performance_schema.* TO 'verifier'@'%';
- Configure
keepalived
on all nodes.
sudo vim /etc/keepalived/keepalived.conf
global_defs {
router_id mysql
}
vrrp_script check_mysql {
script "/bin/check_primary.sh"
interval 2
weight 50
script_security script
}
vrrp_instance VI_01 {
state BACKUP
interface enp0s3
virtual_router_id 50
priority 90
virtual_ipaddress {
x.x.x.167/24
}
track_script {
check_mysql
}
authentication {
auth_type AH
auth_pass secret
}
}
- Create
check_primary.sh
script.
sudo vim /bin/check_primary.sh
#!/bin/sh
# Specify the address and port of your MySQL Group Replication primary node
NODE_IP=$(hostname -I | awk '{print $1}')
NODE_PORT="3306"
# Specify the user and password to access MySQL (replace with your credentials)
MYSQL_USER="verifier"
MYSQL_PASSWORD="1850ABc"
# Define a function to check if the node is the primary
is_primary() {
# validate if the node is active
if [ -z `pidof mysqld` ]; then
return 1 # Node is not active
fi
result=$(mysql -u$MYSQL_USER -p$MYSQL_PASSWORD -h $NODE_IP -P $NODE_PORT -e "SELECT MEMBER_HOST FROM performance_schema.replication_group_members WHERE MEMBER_ROLE = 'PRIMARY'" 2>/dev/null | grep 'SQL-')
hostname=$(hostname)
# validate if the node is the primary checking the hostname against the result
if [ "$result" = "$hostname" ]; then
return 0 # Node is the primary
else
return 1 # Node is not the primary
fi
}
# Check if this node is the primary
if is_primary; then
# This node is the primary, so Keepalived should notify that it's the master
exit 0
else
# This node is not the primary, so Keepalived should notify that it's a backup
exit 1
fi
- Le damos permisos de ejecución al script.
sudo chmod +x /bin/check_primary.sh
- Enable
keepalived
service on all nodes.
sudo systemctl start keepalived
sudo systemctl enable keepalived
- Check the ip address of all nodes and in one of them, you must see the
virtual ip address
of the cluster.
ip a
- Create a database and a user for
wordpress
These commands must be run on the primary node. Remember the credentials that you are going to use.
CREATE DATABASE wordpress;
CREATE USER 'wordpressuser'@'%' IDENTIFIED WITH mysql_native_password BY '1850ABc';
GRANT ALL PRIVILEGES ON wordpress.* TO 'wordpress'@'%';
Load Balancing Cluster
We must generate three virtual machines with the following configuration:
-
Server Load Balancer
- IP: x.x.x.162
- Hostname: LOAD-BALANCER
-
Server One
- IP: x.x.x.160
- Hostname: SERVER-ONE
-
Server Two
- IP: x.x.x.161
- Hostname: SERVER-TWO
If you want to change the IP address of a virtual machine, you can read the following post.
In this project I’m going to use Wordpress, so if you want to do the same configuration for any other application, you can do it.
Wordpress
Run these commands on the SERVER-ONE
and SERVER-TWO
virtual machines
- Install
nginx
on all nodes
dnf install -y nginx
- Install PHP extensions
sudo dnf install php php-mysqlnd php-fpm php-gd php-xml php-mbstring php-json php-opcache php-zip php-curl -y
- Install Wordpress on
/usr/share/nginx/html
and move the contents to the root directory
cd /usr/share/nginx/html
sudo wget https://wordpress.org/latest.tar.gz
sudo tar -xvzf latest.tar.gz
sudo mv wordpress/* /usr/share/nginx/html
- Remove unnecessary files
sudo rm -rf wordpress latest.tar.gz
- Copy the wp-config-sample.php file to wp-config.php
sudo cp /usr/share/nginx/html/wp-config-sample.php /usr/share/nginx/html/wp-config.php
- Update the database name, username, and password in the wp-config.php file
You must replace the field with the following values:
define( 'DB_NAME', 'wordpress' );
define( 'DB_USER', 'wordpressuser' );
define( 'DB_PASSWORD', '1850ABc' );
define( 'DB_HOST', 'x.x.x.167' ); // the ip address of the high availability cluster
- Verify if the file is present, otherwise create and add the following config for the server to read PHP files
/etc/nginx/conf.d/php-fpm.conf
upstream php-fpm {
server unix:/run/php-fpm/www.sock;
}
- Update the firewall rules
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload
- Start and enable the httpd and php-fpm services
sudo systemctl start nginx
sudo systemctl enable nginx
sudo systemctl start php-fpm
sudo systemctl enable php-fpm
- Set SELinux to permissive mode
sudo setenforce 0
Load Balancer setup
- Install
nginx
dnf install -y nginx
- Create a configuration file for the load balancer
sudo vim /etc/nginx/conf.d/wordpress.conf
log_format upstreamlog '$host to: $upstream_addr [$request]'
'upstream_response_time $upstream_response_time '
'msec $msec request_time $request_time';
upstream backend {
# ip_hash;
server x.x.x.160;
server x.x.x.161;
}
server {
listen 80;
access_log /var/log/nginx/access.log upstreamlog;
location / {
proxy_pass http://backend;
sub_filter 'http://backend' 'http://$host';
sub_filter_once off;
}
location ~ \.css {
add_header Content-Type text/css;
proxy_pass http://backend;
sub_filter 'http://backend' 'http://$host';
sub_filter_once off;
}
location ~ \.js {
add_header Content-Type application/javascript;
proxy_pass http://backend;
proxy_set_header Host $host;
}
}
You can check if the configuration if right running the following command
nginx -t
- Update the firewall rules
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload
- Start and enable the
nginx
service
sudo systemctl start nginx
sudo systemctl enable nginx
- Check the correct configuration of the load balancer seeing the ip address of the load balancer
Conclusion
Building a high availability cluster is not an easy task, but it is not impossible. You must have a lot of patience and be willing to learn new things. I hope this project can help you to build your own high availability cluster.
Remember that high availability is not the same as load balancing. You can read more about this here.
If you liked this project, please follow me on LinkedIn, Instagram and GitHub to stay tuned for more projects and be sure to check out my other projects.