Setting up Ubuntu Cluster

De Grupo de Inteligencia Computacional (GIC)
Revisión del 18:41 9 sep 2010 de Alexsavio (discusión | contribs.)
(difs.) ← Revisión anterior | Revisión actual (difs.) | Revisión siguiente → (difs.)

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <meta name="generator" content="http://www.nongnu.org/elyxer/"/> <meta name="create-date" content="2010-09-09"/> <link rel="stylesheet" href="http://www.nongnu.org/elyxer/lyx.css" type="text/css" media="screen"/> <title>Technical report of the procedure of setup of a Ubuntu Server computer cluster</title> </head> <body>

Technical report of the procedure of setup of a Ubuntu Server computer cluster

Alexandre Manhães Savio

September 24th, 2010

<a class="toc" name="toc-Section-1">1</a> Master Node - Giclus1

<a class="toc" name="toc-Subsection-1.1">1.1</a> Operating system and network configuration

  • Install Ubuntu Server 10.04.1 amd64
  • Partitions
    • /dev/sda1: 20GB on /
    • /dev/sda2: 20GB on /usr/local
    • /dev/sda3: 130GB on /home
    • /dev/sda5: 280GB on /opt
    • /dev/sda6:20GB of swap area
  • Add gic group:
    • sudo addgroup gic
    • sudo addgroup alexandre gic
  • Add more users:
    • sudo adduser <user_name> --ingroup gic --disabled-password
    • ssh-keygen -b 4096 -t rsa -C user_name
    • chown -R user_name:root /home/user_name/.ssh
    • chmod 700 /home/user_name/.ssh
    • chmod 400 /home/user_name/.ssh/authorized_keys
  • Edit /etc/network/interfaces

    auto lo
    iface lo inet loopback

    auto eth1
    iface eth1 inet static
    address 192.168.1.81
    netmask 255.255.255.0
    gateway 192.168.1.1

    auto eth0
    iface eth0 inet dhcp
    #for more information about this email me

  • Edit /etc/resolv.conf

    nameserver 10.20.13.6
    nameserver 10.10.13.6
    nameserver 10.30.13.6

  • sudo /etc/init.d/networking restart
  • Install packages:
    • Add "partner" repository, editing /etc/apt/sources.list
    • sudo apt-get update
    • sudo apt-get install ssh molly-guard openssh-blacklist openssh-blacklist-extra ssh-askpass binutils unzip sun-java6-jre
  • Edit /etc/ssh/sshd_conf: disable root access and password auth
  • Edit /etc/hosts

    127.0.0.1 localhost giclus1
    192.168.1.81 giclus1
    192.168.1.82 giclus2
    192.168.1.83 giclus3
    192.168.1.84 giclus4

  • Edit /etc/hosts.allow

    portmap ypserv ypbind sge_qmaster sge_execd : 192.168.1.81 192.168.1.82 192.168.1.83 192.168.1.84

<a class="toc" name="toc-Subsection-1.2">1.2</a> NIS Server

<a class="toc" name="toc-Subsection-1.3">1.3</a> NFS Kernel Server

  • <a class="URL" href="https://help.ubuntu.com/community/SettingUpNFSHowTo">https://help.ubuntu.com/community/SettingUpNFSHowTo</a>
  • sudo apt-get install nfswatch nfs-kernel-server
  • Edit /etc/exports and add the shares:

    /home giclus1(rw,sync,no_subtree_check) giclus2(rw,sync,no_subtree_check) giclus3(rw,sync,no_subtree_check) giclus4(rw,sync,no_subtree_check)
    /usr/local giclus1(rw,sync,no_subtree_check) giclus2(rw,sync,no_subtree_check) giclus3(rw,sync,no_subtree_check) giclus4(rw,sync,no_subtree_check)
    ​/opt giclus1(rw,sync,no_subtree_check) giclus2(rw,sync,no_subtree_check) giclus3(rw,sync,no_subtree_check) giclus4(rw,sync,no_subtree_check)

  • sudo exportfs -ra

<a class="toc" name="toc-Subsection-1.4">1.4</a> MSMTP:

  • Create a Gmail account for monitoring. I do this because I don’t want my gmail password floating around in plaintext on various machines.
  • Install the ca-certificates package

    sudo aptitude install ca-certificates
    sudo update-ca-certificates

  • sudo apt-get install msmtp
  • Edit /etc/msmtprc

    account gmail
    host smtp.gmail.com
    from giclus1@gmail.com
    auth on
    tls on
    tls_trust_file /etc/ssl/certs/ca-certificates.crt
    user giclus1@gmail.com
    password *******
    port 587

    account default : gmail

  • Create a sendmail simlink:
    • sudo ln -s /usr/bin/msmtp /usr/sbin/sendmail
  • -Run a test
    • echo “This is a an awesome test email” | msmtp youremail@domain.com
  • - If you want mdadm to mail you when something goes wrong
    • Edit /etc/mdadm/mdadm.conf: MAILADDR giclus1@gmail.com
  • And then run a mdadm test by running
    • sudo mdadm --monitor --scan --test --oneshot

<a class="toc" name="toc-Subsection-1.5">1.5</a> Share internet connection with the others in the cluster:

<a class="toc" name="toc-Subsubsection-1.5.1">1.5.1</a> UFW Version

  • Enable <a class="URL" href="https://help.ubuntu.com/community/UFW">UFW</a>

    sudo ufw enable
    sudo ufw allow 22/tcp
    sudo ufw allow 22/udp
    sudo ufw allow in on eth1

  • Edit file /etc/ufw/before.rules:

    # nat Table rules

    • nat :POSTROUTING ACCEPT [0:0]
    1. .

    -A POSTROUTING -s 192.168.0.0/24 -o eth1 -j MASQUERADE
    COMMIT

  • Edit /etc/default/ufw
    • Change DEFAULT_FORWARD_POLICY to “ACCEPT
    • Uncomment:
      • net/ipv4/ip_forward=1
      • net/ipv6/conf/default/forwarding=1
  • Restart ufw:
    • sudo ufw disable
    • sudo ufw enable

<a class="toc" name="toc-Subsubsection-1.5.2">1.5.2</a> IPTABLES Version

  • sudo iptables -A FORWARD -i eth0 -o eth1 -s 192.168.1.0/24 -m conntrack --ctstate NEW -j ACCEPT
  • sudo iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
  • sudo iptables -A POSTROUTING -t nat -j MASQUERADE
  • sudo iptables-save | sudo tee /etc/iptables.sav
  • Add to /etc/rc.local
    • iptables-restore < /etc/iptables.sav
  • Add to /etc/sysctl.conf
    • net.ipv4.conf.default.forwarding=1
    • net.ipv4.conf.all.forwarding=1
  • sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"

<a class="toc" name="toc-Subsection-1.6">1.6</a> Sun Grid Engine Master

  • <a class="URL" href="http://biowiki.org/HowToAdministerSunGridEngine">http://biowiki.org/HowToAdministerSunGridEngine</a>
  • <a class="URL" href="https://www.fmrib.ox.ac.uk/phpwiki/index.php/FslSge">https://www.fmrib.ox.ac.uk/phpwiki/index.php/FslSge</a>
  • sudo apt-get install libmotif3 libxpm4
  • Download SGE from: http://www.oracle.com
  • Install SGE (check this):

    mkdir /opt/soft/sge
    mv ge-6.1u6-* ../../soft/sge
    cd ../../soft/sge
    tar xvzf ge-6.1u6-common.tar.gz
    tar xvzf ge-6.1u6-arco.tar.gz
    tar xvzf ge-6.1u6-bin-lx24-amd64.tar.gz
    cd ..\\ sudo cp -rdvfa sge /
    cd /sge
    scp -rdv giclus1:/sge/* .
    sudo ./inst_sge -m -x

  • Now go through the interactive install process
  • Add to /etc/bash.bashrc

    #SGE settings export
    SGE_ROOT=/sge
    export SGE_CELL=default
    if [ -e $SGE_ROOT/$SGE_CELL ]
    then

      . $SGE_ROOT/$SGE_CELL/common/settings.sh

    fi

  • ERROR “[: 359: 11: unexpected operator
    • On Ubuntu 10.04 LTS libc version detection fails in util/arch. The reason is that now (around line 244) strings libc.so.6 returns GNU C Library (Ubuntu EGLIBC 2.11.1-0ubuntu7) stable release version 2.11.1, by Roland McGrath et al. where the version number appears twice. The subsequent tests get a string like "11\n11" instead of just "11" and the shell complains that the syntax of the if conditions is wrong. I fixed it by adding uniq to this line to the file /sge/util/arch:

      libc_version=‘echo $libc_string | tr ’ ,’ ’\n’ | grep "2\." | cut -f 2 -d "." | uniq‘

  • ERROR sgemaster and sgeexecd won’t start on boot

    cd /etc/init.d/
    sudo update-rc.d sgeexecd.giclus defaults
    sudo update-rc.d sgemaster.giclus defaults

<a class="toc" name="toc-Section-2">2</a> The other nodes: GICLUS{2-3-4}

<a class="toc" name="toc-Subsection-2.1">2.1</a> Operating system and network configuration

  • Install Ubuntu Server 10.04.1 amd64
  • Partitions
    • /dev/sda1: 30GB on /
    • /dev/sda2: 450GB on /local_opt
    • /dev/sda3: 20GB of swap area
  • Add gic group:
    • sudo addgroup gic
    • sudo addgroup alexandre gic
  • Add more users:
    • sudo adduser <user_name> --ingroup gic --disabled-password
    • ssh-keygen -b 4096 -t rsa -C user_name
    • chown -R user_name:root /home/user_name/.ssh
    • chmod 700 /home/user_name/.ssh
    • chmod 400 /home/user_name/.ssh/authorized_keys
  • Edit /etc/network/interfaces

    auto lo
    iface lo inet loopback

    auto eth1
    iface eth1 inet static
    address 192.168.1.8{2,3,4}
    netmask 255.255.255.0
    gateway 192.168.1.81

  • Edit /etc/resolv.conf

    nameserver 10.20.13.6
    nameserver 10.10.13.6
    nameserver 10.30.13.6

  • sudo /etc/init.d/networking restart
  • Install packages:
    • Add "partner" repository, editing /etc/apt/sources.list
    • sudo apt-get update
    • sudo apt-get install ssh molly-guard openssh-blacklist openssh-blacklist-extra ssh-askpass binutils unzip sun-java6-jre
    • Edit /etc/ssh/sshd_conf: disable root access and password authentication
  • Edit /etc/ssh/sshd_conf: disable root access and password auth
  • Edit /etc/hosts

    127.0.0.1 localhost giclus{2,3,4}
    192.168.1.81 giclus1
    192.168.1.82 giclus2
    192.168.1.83 giclus3
    192.168.1.84 giclus4

  • sudo /etc/init.d/networking restart
  • Install packages:
    • Add "partner" repository, editing /etc/apt/sources.list
    • sudo apt-get update
    • sudo apt-get upgrade
    • sudo apt-get install ssh molly-guard openssh-blacklist openssh-blacklist-extra ssh-askpass binutils unzip sun-java6-jre

<a class="toc" name="toc-Subsection-2.2">2.2</a> NIS Client

<a class="toc" name="toc-Subsection-2.3">2.3</a> NFS Client

  • sudo apt-get install nfs-common
  • Add to /etc/fstab

    #NFS Cluster mount
    giclus1:/home /home nfs rsize=8192,wsize=8192,timeo=14,intr,rw
    giclus1:/opt /opt nfs rsize=8192,wsize=8192,timeo=14,intr,rw
    giclus1:/usr/local /usr/local nfs rsize=8192,wsize=8192,timeo=14,intr,rw

<a class="toc" name="toc-Subsection-2.4">2.4</a> Sun Grid Engine Exec Daemon

  • <a class="URL" href="http://biowiki.org/HowToAdministerSunGridEngine">http://biowiki.org/HowToAdministerSunGridEngine</a>
  • <a class="URL" href="https://www.fmrib.ox.ac.uk/phpwiki/index.php/FslSge">https://www.fmrib.ox.ac.uk/phpwiki/index.php/FslSge</a>
  • sudo apt-get install libmotif3 libxpm4
  • Download SGE from: http://www.oracle.com
  • Install SGE (check this):

    mkdir /opt/soft/sge
    mv ge-6.1u6-* ../../soft/sge
    cd ../../soft/sge
    tar xvzf ge-6.1u6-common.tar.gz
    tar xvzf ge-6.1u6-arco.tar.gz
    tar xvzf ge-6.1u6-bin-lx24-amd64.tar.gz
    cd ..\\ sudo cp -rdvfa sge /
    cd /sge
    scp -rdv giclus1:/sge/* .
    sudo ./install_execd

  • Now go through the interactive install process
  • Add to /etc/bash.bashrc

    #SGE settings export
    SGE_ROOT=/sge
    export SGE_CELL=default
    if [ -e $SGE_ROOT/$SGE_CELL ]
    then

      . $SGE_ROOT/$SGE_CELL/common/settings.sh

    fi

  • ERROR “[: 359: 11: unexpected operator
    • On Ubuntu 10.04 LTS libc version detection fails in util/arch. The reason is that now (around line 244) strings libc.so.6 returns GNU C Library (Ubuntu EGLIBC 2.11.1-0ubuntu7) stable release version 2.11.1, by Roland McGrath et al. where the version number appears twice. The subsequent tests get a string like "11\n11" instead of just "11" and the shell complains that the syntax of the if conditions is wrong. I fixed it by adding uniq to this line to the file /sge/util/arch:

      libc_version=‘echo $libc_string | tr ’ ,’ ’\n’ | grep "2\." | cut -f 2 -d "." | uniq‘

  • ERROR sgemaster and sgeexecd won’t start on boot

    cd /etc/init.d/
    sudo update-rc.d sgeexecd.giclus defaults

<a class="toc" name="toc-Section-3">3</a> Execute in all nodes

<a class="toc" name="toc-Subsection-3.1">3.1</a> Install NeuroDebian Repository ( http://neuro.debian.net/ )

<a class="toc" name="toc-Subsection-3.2">3.2</a> Other configuration details

  • Local temporary work directory
    • Add to /etc/environment
      • LOCAL_TEMP="/local"
      • (for giclus1) this has been set to “/opt/temp”
    • sudo chown -R alexandre:gic /local
    • sudo chmod -R 770 /local

Copyright (C) 2010 Alexandre Manhães Savio

</body> </html>