Rate This Document
Findability
Accuracy
Completeness
Readability

Installing BoostIO

  • Deploy BoostIO in separated, converged, or independent mode. In converged mode, use the upper-layer component user account (for example, juiceadmin:juicegroup) to install the BoostIO server and SDK. In separated and independent modes, create a server user account (for example, bioadmin:biogroup) to install the BoostIO server and use the upper-layer component user account to install the BoostIO SDK. Do not use the root user for installation operations because the root user causes security risks.
  • The communication with Ceph and HDFS is configured by users. Use secure communication links to ensure communication security.

Creating a BoostIO Server Running User

Ensure that the GID of user group biogroup and the UID of user bioadmin on all physical machines (storage, management, and compute nodes) and containers are not occupied. If the GID or UID is occupied, services may be unavailable.
  • The GID of biogroup is 1000.
  • The UID of bioadmin is 9000.
  • The password of user bioadmin must meet the following complexity requirements:
    • Contain at least eight characters.
    • Contain at least three of the following character types:
      • Lowercase letters
      • Uppercase letters
      • Digits
      • Special characters: spaces and `~!@#$%^&*()-_=+\|[{}];:'",<.>/?
    • The password must be different from the account name.

Run the following commands on the node where BoostIO is installed to create a user account.

  1. Create user group biogroup.
    groupadd -g 1000 biogroup
  2. Create user bioadmin in user group biogroup.
    useradd -g 1000 -d /home/bioadmin -u 9000 -m -s /bin/bash bioadmin

(Optional) Cleaning Up the Environment

  • Before the installation, ensure that BoostIO is not installed in the current environment. If BoostIO is installed, clear the environment to prepare for the new installation.
  • You are advised to delete unused log files from the SDK client in a timely manner to prevent drive space exhaustion.
  • The maximum size of a statistics file is 10 MB on the SDK client and 50 MB on the server. Statistics are collected cyclically. After BoostIO is redeployed and started, a new statistics file is generated. You are advised to clear the old statistics file.
  1. Collect the IP addresses of the nodes on which you want to install BoostIO.
  2. Provide at least one NVMe SSD for each node in the cluster and set the SSD owner to the current installation user and user group.
    chown [Server_installation_user:Server_installation_user_group] /dev/nvmexnx
  3. During the initial installation, you need to create the following directories and configure permissions for them:
    Table 1 Directories and permissions

    Directory

    User and User Group

    Permission

    Description

    /opt/boostio

    Server_installation_user:Server_installation_user_group

    750

    BoostIO installation directory.

    /var/log/boostio

    Server_installation_user:Server_installation_user_group

    750

    BoostIO server log directory.

    /var/log/boostio/trace

    Server_installation_user:Server_installation_user_group

    750

    BoostIO statistics log directory.

    /home/ip (This IP address is the same as that in the host_ip_list file on each node.)

    Server_installation_user:Server_installation_user_group

    750

    Directory for storing temporary files during BoostIO installation. The files are automatically deleted after the installation is complete.

    /var/log/jfs

    SDK_installation_user:SDK_installation_user_group

    750

    BoostIO SDK client log directory.

  4. Configure the Ceph key ring permission.

    Starting BoostIO requires reading the Ceph client key and obtaining the corresponding permission. For details, see the Ceph official website.

  5. Run the following command on the ZooKeeper server node (ZooKeeper 3.8.1 as an example) to clear the BoostIO cluster information. Add the SO file required by the ZooKeeper client to the {BoostIO_Home}/lib directory, and change the owner to Server_installation_user:Server_installation_user_group and the permissions to 550.
    sh /install_path/apache-zookeeper-3.8.1-bin/bin/zkCli.sh
    >>deleteall /cm
  6. Clear the BoostIO drive management metadata.
    dd bs=8k count=1024 if=/dev/zero of=/dev/nvmexnx

Installing BoostIO

  1. Log in to the installation node and upload the ubs_io-boostio-1.0.0-1.{OS_version}.aarch64.rpm software package to any available directory.
  2. Install the software package.
    rpm -ivh --nodeps ubs_io-boostio-1.0.0-1.{OS_version}.aarch64.rpm

    After the installation, the following files are generated in the /home directory:

    BoostIO_{version}_Linux-{arch}_release.tar.gz and the boostio directory extracted from the TAR package. For details about the directory structure, see Table 2.
    Table 2 Directory structure of the installation package

    Directory

    Folder in the Directory

    Description

    BoostIO

    bin

    Executable file.

    conf

    Configuration file.

    lib

    Binary dependency library.

    include

    Header file.

    scripts

    Tool scripts.

    The tool scripts used to install BoostIO are stored in the scripts directory. See Table 3.

    Table 3 Files in the scripts directory

    Directory

    Tool Name

    Usage

    Execution Method

    Parameter

    scripts

    hand_out_deploy.py

    Installation tool script.

    hand_out_deploy.py [option]

    • install [pkg_path] [user] [group]
    • uninstall

    host_ip_list

    Communication IP address and drive information configuration file.

    Used by hand_out_deploy.py.

    -

    install.sh

    Installation execution script.

    install.sh [option]

    Used by hand_out_deploy.py.

    • install [user] [group] [install_path]
    • uninstall

    scp_file.sh

    scp command execution file.

    Used by hand_out_deploy.py.

    -

    ssh_cmd.sh

    ssh command execution file.

    Used by hand_out_deploy.py.

    -

    Table 4 Files in the bin directory

    Directory

    File Name

    Description

    bin

    bio_daemon

    Executable file of the BoostIO service.

    seceasy_encrypt

    Executable file of the encryption service.

    Table 5 Files in the lib directory

    Directory

    File Name

    Description

    lib

    libbdm.so

    Shared object file of BDM, which is used for drive management.

    libbio_interceptor_server.so

    Shared object file of the bridging service.

    libbio_sdk.so

    Share object file of the BoostIO SDK client.

    libbio_server.so

    Shared object file of the BoostIO server.

    libhcom.so

    Shared object file of HCOM, which is used for network transmission.

    libhcom_static.a

    Static library file of HCOM.

    libhse_cryption.so

    Executable file of hseceasy, which is used for encryption.

    libock_interceptor.so

    Shared object file of the bridging service.

    libock_iofwd_proxy.so

    Shared object file of the bridging service.

    libsecurec.a

    Static library file of the encryption service.

    libexpire_checker.so

    Shared object file of SSL certificate check.

  3. Configure the installation information.

    Set configuration items of the bio.conf file in the conf directory based on your environment and service requirements. See Table 6.

    Table 6 BoostIO configuration items

    Module

    Configuration Item

    Description

    Default Value

    Value/Range

    Remarks

    Log

    bio.log.level

    Log level.

    info

    • debug
    • info
    • warn
    • trace
    • error

    -

    Net

    bio.net.data.ip_mask

    IP address range.

    127.0.0.1/24

    *.*.*.*/#, where * ranges from 0 to 255 and # ranges from 0 to 32.

    When using JuiceFS for big data services, the value of this field must be the same as the IP address corresponding to the host name in the /etc/hosts file.

    bio.net.data.listen_port

    Network communication port on the service plane.

    7201

    7201 to 7800

    -

    bio.net.data.protocol

    Network protocol.

    tcp

    • rdma
    • tcp

    -

    bio.net.rpc.data.busy_polling_mode

    Indicates whether to enable busy-polling for Remote Procedure Call (RPC).

    false

    • true
    • false

    Available only to RDMA.

    bio.net.rpc.data.workers_count

    Number of worker cores on the RPC data plane.

    4

    1 to 16

    -

    bio.net.request.executor.thread.num

    Number of threads for processing requests at the receive end.

    8

    8 to 256

    -

    bio.net.request.executor.queue.size

    Depth of the request processing queue at the receive end.

    1,024

    1,024 to 65,535

    -

    bio.net.ipc.data.busy_polling_mode

    Indicates whether to enable busy-polling for inter-process communication (IPC).

    false

    • true
    • false

    -

    bio.net.ipc.data.workers_count

    Number of worker cores on the IPC data plane.

    4

    1 to 128

    -

    bio.net.tls.enable.switch

    Network security option.

    true

    • true
    • false
    • Disabling this option may cause information leakage and spoofing risks.
    • If BoostIO is deployed in separated deployment mode, the value of the enableTls parameter transferred by the BoostIO service initialization API must be the same as the value of this configuration item.

    bio.net.tls.ca.cert.path

    Path to the CA certificate.

    /path/CA/cacert.pem

    The default value is only an example.

    If the security option is enabled, the path must be a valid one. If the security function is disabled, the configuration item is not parsed.

    bio.net.tls.ca.crl.path

    Path to the certificate revocation list (CRL) file.

    -

    -

    If the security option is enabled and the certificate needs to be checked whether it has been revoked, the path must be a valid one. If the security function is disabled, the configuration item is not parsed.

    bio.net.tls.server.cert.path

    Path to the certificate file on the server.

    /path/server/servercert.pem

    The default value is only an example.

    If the security option is enabled, the path must be a valid one. If the security function is disabled, the configuration item is not parsed.

    bio.net.tls.server.key.path

    Path to the certificate private key file on the server.

    /path/server/serverkey.pem

    The default value is only an example.

    If the security option is enabled, the path must be a valid one. If the security function is disabled, the configuration item is not parsed.

    bio.net.tls.server.key.pass.pathPosix

    Path to the private key password of the working certificate.

    /path/server/server.keypass

    The default value is only an example.

    If the security option is enabled, the path must be a valid one. If the security function is disabled, the configuration item is not parsed.

    bio.net.hesc.server.tls.kfs.master.path

    Path to the root key generated when encrypting the private key of the working certificate.

    /path/server/master/kfsa

    The default value is only an example.

    If the security option is enabled, the path must be a valid one. If the security function is disabled, the configuration item is not parsed.

    bio.net.hesc.server.tls.kfs.pass.standby.path

    Path to the standby root key generated when encrypting the private key of the working certificate.

    /path/server/standby/kfs

    The default value is only an example.

    If the security option is enabled, the path must be a valid one. If the security function is disabled, the configuration item is not parsed.

    Cache

    bio.cache.qos.enable

    Flow control option.

    true

    • false
    • true

    Enabling this option impairs the extreme performance. You are advised to disable this option in performance test cases.

    bio.data.crc.enable

    Data integrity verification option.

    false

    • false
    • true

    Enabling this option increases the data read and write latencies. You are advised to enable this option in fault locating scenarios.

    bio.segment.size_in_mb

    Cache resource granularity.

    4

    1 to 16

    Unit: MB.

    bio.mem.size_in_gb

    Memory capacity used as cache resources.

    50

    0 to 512

    • The value cannot exceed the system memory size.
    • Unit: GB.
    • The value 0 indicates that the node does not provide caching.

    bio.disk.path

    List of drives used as cache resources.

    /dev/sdxx:/dev/sdyy

    -

    Separate them using colons (:) if there are multiple drive paths. The current version supports a maximum of four drives.

    bio.rcache.evict_water_level

    Eviction watermark of the read cache.

    90

    0 to 100

    Percentage of the used read cache.

    bio.cache.mem_read_write_ratio

    Read/write resource ratio of the memory.

    5:5

    0 to 10:10 to 0

    -

    bio.cache.disk_read_write_ratio

    Read/write resource ratio of the drives.

    5:5

    0 to 10:10 to 0

    -

    bio.work.scene

    Application scenario flag.

    none

    • none
    • bigdata

    Optional. The default value is none.

    • none: There is no usage restriction.
    • bigdata: Used for big data scenarios. Compared with AI scenarios, the main difference is that I/Os are forcibly aligned in big data scenarios.

    bio.work.io.alignsize

    I/O alignment data size.

    1

    1 to 4,194,304

    (Optional) The unit is byte.

    bio.wcache.evict_water_level

    Eviction watermark of the write cache.

    0

    0 to 100

    (Optional) The default value is 0, indicating the percentage of the used write cache.

    bio.wcache.negotiate.delay

    Eviction negotiation delay.

    100

    50 to 1,000

    Optional. The default value is 100 ms. In scenarios that are sensitive to foreground write performance, set this parameter to a larger value to increase the eviction delay. In other scenarios, set this parameter to a smaller value, which accelerates eviction.

    bio.trace.enable

    Process statistics collection option.

    true

    • false
    • true

    Enabling this option impairs the extreme performance. You are advised to disable this option in performance test cases.

    Underfs

    bio.underfs.file_system_type

    Back-end storage system type.

    ceph

    • ceph
    • hdfs

    -

    bio.underfs.ceph.cfg.path

    Path to the Ceph configuration file.

    /etc/ceph/ceph.conf

    This parameter cannot be left empty.

    Mandatory when ceph is selected. The value must be an existing path.

    bio.underfs.ceph.cluster

    Ceph cluster name.

    ceph

    This parameter cannot be left empty.

    Mandatory when ceph is selected.

    bio.underfs.ceph.user

    Ceph user.

    client.admin

    This parameter cannot be left empty.

    Mandatory when ceph is selected.

    bio.underfs.ceph.pool

    Ceph data pool.

    0:jfspool0,1:jfspool1

    This parameter cannot be left empty.

    Mandatory when ceph is selected. Use commas (,) to separate multiple parameters.

    bio.underfs.hdfs.name_node

    NameNode of Hadoop.

    default:0

    *.*.*.*/#, where * ranges from 0 to 255 and # ranges from 0 to 65535.

    (Optional) The default value is default:0. The format is IP_address:Port, which indicates the IP address and port specified in the Hadoop configuration file.

    bio.underfs.hdfs.working_path

    Path for storing files in the HDFS system.

    /hdfs

    It is a valid path that contains 255 or fewer characters.

    Optional. The default value is /hdfs.

    CM

    bio.cm.initial.nodes_count

    Expected number of nodes during cluster initialization.

    2

    2 to 256

    -

    bio.cm.copy_num

    Data redundancy.

    2

    2

    The current software version supports only dual copies.

    bio.cm.pts_count

    Number of partitions.

    16

    2 to 8,192

    -

    bio.cm.register_timeout_sec

    Timeout duration of the ZooKeeper heartbeat check.

    20

    10 to 60

    Unit: s.

    bio.cm.register_perm_timeout_sec

    Time window for determining permanent faults.

    60

    60 to 600

    Unit: s.

    bio.cm.zk_host

    ZooKeeper service node information.

    Example: 127.0.0.1:2181,127.0.0.2:2181,127.0.0.3:2181 for a ZooKeeper cluster containing three nodes.

    -

    This parameter cannot be left empty.

    The IP address segment used by ZooKeeper must be the same as the service IP address segment.

    Prometheus

    bio.prometheus.exposer

    IP address and port number of the Prometheus server.

    -

    *.*.*.*:#, where * ranges from 0 to 255 and # ranges from 0 to 65535.

    (Optional)

    bio.prometheus.scrape_interval_sec

    Prometheus sampling frequency.

    15

    -

    (Optional) The unit is second.

  4. Configure the host_ip_list file.
    1. Open the host_ip_list file.
      vim boostio/scripts/host_ip_list
    2. Press i to enter the insert mode. Add the following content to the host_ip_list file (replace the variables with the actual ones):
      ip1::BoostIO_communication_IP_address_1::Drive_address_1:Drive_address_2
      ip2::BoostIO_communication_IP_address_2::Drive_address_1:Drive_address_2
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
  5. Set the user and user group to which the drives belong.
    chown [Server_installation_user:Server_installation_user_group] Drive address 1
    chown [Server_installation_user:Server_installation_user_group] Drive address 2
  6. Run the installation script.
    python3 hand_out_deploy.py install [Path to the installation package that is not decompressed] [Server installation user] [Server installation user group]
    • In the openEuler 20.03 OS, you need to install and configure Python 3 to run the installation script.
    • All nodes in the cluster use the same user account and password.
    Enter the server installation user name and password as prompted.
    Figure 1 Command output

    After the installation is complete, all BoostIO files and directories are stored in /opt/boostio.