storage - 高可用性ストレージ

Question

NFS および CIFS 経由で 2 TB 程度を利用できるようにしたいと考えています。高可用性と、可能であればサーバー間で負荷を分散する機能のために、2 台 (またはそれ以上) のサーバーソリューションを探しています。クラスタリングまたは高可用性ソリューションに関する提案はありますか?

これはビジネス用途であり、今後数年間で 5 ～ 10 TB に拡大する予定です。私たちの施設は、ほぼ 1 日 24 時間、週 6 日稼働しています。15 ～ 30 分のダウンタイムが発生する可能性がありますが、データ損失を最小限に抑えたいと考えています。午前 3 時の電話を最小限に抑えたい。

現在、Solaris で ZFS を使用して 1 台のサーバーを実行しており、HA 部分については AVS を検討していますが、Solaris には小さな問題 (CIFS 実装が Vista では機能しないなど) があり、それが私たちの足を引っ張っています。

私たちは見始めました

DRDB over GFS (分散ロック機能用の GFS)
Gluster (クライアント部分が必要、ネイティブ CIFS サポートなし?)
Windows DFS (ファイルを閉じた後にのみレプリケートするとドキュメントに記載されていますか?)

データを提供する「ブラックボックス」を探しています。

現在、ZFS でデータのスナップショットを作成し、そのスナップショットをネット経由でリモートデータセンターに送信してオフサイトバックアップを行っています。

当初の計画では、2 台目のマシンを用意し、10 ～ 15 分ごとに rsync を実行する予定でした。障害が発生した場合の問題は、進行中の生産プロセスが 15 分間のデータを失い、「途中」に残されることです。途中でピックアップする場所を見つけるよりも、最初から始める方がほとんど簡単です。それが、私たちが HA ソリューションに目を向けた理由です。

score 6 · Accepted Answer

最近、バックエンドとして DRBD を使用して hanfs をデプロイしました。私の状況では、アクティブ/スタンバイモードで実行していますが、プライマリ/プライマリモードでも OCFS2 を使用して正常にテストしました。残念ながら、これを達成するための最良の方法に関するドキュメントはあまりありません。存在するほとんどのドキュメントは、せいぜいほとんど役に立ちません。drbd ルートをたどる場合は、drbd メーリングリストに参加し、すべてのドキュメントを読むことを強くお勧めします。以下は、ha の失敗を処理するために作成した ha/drbd のセットアップとスクリプトです。

DRBD8 が必要です - これは drbd8-utils と drbd8-source によって提供されます。これらがインストールされたら (バックポートによって提供されると思います)、module-assistant を使用してインストールできます - ma ai drbd8. この時点で depmod -a または再起動します。depmod -a を実行する場合は、drbd を modprobe する必要があります。

drbd に使用するバックエンドパーティションが必要になります。このパーティションを LVM にしないでください。そうしないと、さまざまな問題が発生します。LVM を drbd デバイスに置かないでください。さまざまな問題が発生します。

ハンフ1:


/etc/drbd.conf

global {
        usage-count no;
}
common {
        protocol C;
        disk { on-io-error detach; }
}
resource export {
        syncer {
                rate 125M;
        }
        on hanfs2 {
                address         172.20.1.218:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
        }
        on hanfs1 {
                address         172.20.1.219:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
       }
}

Hanfs2 の /etc/drbd.conf:


global {
        usage-count no;
}
common {
        protocol C;
        disk { on-io-error detach; }
}
resource export {
        syncer {
                rate 125M;
        }
        on hanfs2 {
                address         172.20.1.218:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
        }
        on hanfs1 {
                address         172.20.1.219:7789;
                device          /dev/drbd1;
                disk            /dev/sda3;
                meta-disk       internal;
       }
}

設定したら、次に drbd を起動する必要があります。

drbdadm create-md エクスポート
drbdadm アタッチ エクスポート
drbdadm 接続エクスポート

ここで、データの初期同期を実行する必要があります。明らかに、これが真新しい drbd クラスタであれば、どのノードを選択しても問題ありません。

完了したら、drbd デバイスで mkfs.yourchoiceoffilesystem を実行する必要があります。上記の構成のデバイスは /dev/drbd1 です。http://www.drbd.org/users-guide/p-work.htmlは、drbd を使用する際に読むと便利なドキュメントです。

ハートビート

heartbeat2 をインストールします。(かなり単純な apt-get install heartbeat2)。

各マシンの /etc/ha.d/ha.cf は次のもので構成されている必要があります。

ハンフ1:


logfacility local0
keepalive 2
warntime 10
deadtime 30
initdead 120

ucast eth1 172.20.1.218

auto_failback no

node hanfs1
node hanfs2

hanfs2:


logfacility local0
keepalive 2
warntime 10
deadtime 30
initdead 120

ucast eth1 172.20.1.219

auto_failback no

node hanfs1
node hanfs2

/etc/ha.d/haresources は、両方の ha ボックスで同じにする必要があります。

hanfs1 IPaddr::172.20.1.230/24/eth1
hanfs1 HeartBeatWrapper

フェイルオーバーシナリオで nfs と drbd によって引き起こされる特異性に対処するためのラッパースクリプトを作成しました。このスクリプトは、各マシンの /etc/ha.d/resources.d/ 内に存在する必要があります。



!/bin/bash

heartbeat fails hard.

so this is a wrapper

to get around that stupidity

I'm just wrapping the heartbeat scripts, except for in the case of umount

as they work, mostly

if [[ -e /tmp/heartbeatwrapper ]]; then
    runningpid=$(cat /tmp/heartbeatwrapper)
    if [[ -z $(ps --no-heading -p $runningpid) ]]; then
        echo "PID found, but process seems dead.  Continuing."
    else

        echo "PID found, process is alive, exiting."

        exit 7

    fi

fi                                                            

echo $$ > /tmp/heartbeatwrapper

if [[ x$1 == "xstop" ]]; then

/etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1

NFS init script isn't LSB compatible, exit codes are 0 no matter what happens.

Thanks guys, you really make my day with this bullshit.

Because of the above, we just have to hope that nfs actually catches the signal

to exit, and manages to shut down its connections.

If it doesn't, we'll kill it later, then term any other nfs stuff afterwards.

I found this to be an interesting insight into just how badly NFS is written.

sleep 1

#we don't want to shutdown nfs first!
#The lock files might go away, which would be bad.

#The above seems to not matter much, the only thing I've determined
#is that if you have anything mounted synchronously, it's going to break
#no matter what I do.  Basically, sync == screwed; in NFSv3 terms.      
#End result of failing over while a client that's synchronous is that   
#the client hangs waiting for its nfs server to come back - thing doesn't
#even bother to time out, or attempt a reconnect.                        
#async works as expected - it insta-reconnects as soon as a connection seems
#to be unstable, and continues to write data.  In all tests, md5sums have   
#remained the same with/without failover during transfer.                   

#So, we first unmount /export - this prevents drbd from having a shit-fit
#when we attempt to turn this node secondary.                            

#That's a lie too, to some degree. LVM is entirely to blame for why DRBD
#was refusing to unmount.  Don't get me wrong, having /export mounted doesn't
#help either, but still.                                                     
#fix a usecase where one or other are unmounted already, which causes us to terminate early.

if [[ "$(grep -o /varlibnfs/rpc_pipefs /etc/mtab)" ]]; then                                 
    for ((test=1; test <= 10; test++)); do                                                  
        umount /export/varlibnfs/rpc_pipefs  >/dev/null 2>&1                                
        if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then                        
            break                                                                           
        fi                                                                                  
        if [[ $? -ne 0 ]]; then                                                             
            #try again, harder this time                                                    
            umount -l /var/lib/nfs/rpc_pipefs  >/dev/null 2>&1                              
            if [[ -z $(grep -o /varlibnfs/rpc_pipefs /etc/mtab) ]]; then                    
                break                                                                       
            fi                                                                              
        fi                                                                                  
    done                                                                                    
    if [[ $test -eq 10 ]]; then                                                             
        rm -f /tmp/heartbeatwrapper                                                         
        echo "Problem unmounting rpc_pipefs"                                                
        exit 1                                                                              
    fi                                                                                      
fi                                                                                          

if [[ "$(grep -o /dev/drbd1 /etc/mtab)" ]]; then                                            
    for ((test=1; test <= 10; test++)); do                                                  
        umount /export  >/dev/null 2>&1                                                     
        if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then                                   
            break                                                                           
        fi                                                                                  
        if [[ $? -ne 0 ]]; then                                                             
            #try again, harder this time                                                    
            umount -l /export  >/dev/null 2>&1                                              
            if [[ -z $(grep -o /dev/drbd1 /etc/mtab) ]]; then                               
                break                                                                       
            fi                                                                              
        fi                                                                                  
    done                                                                                    
    if [[ $test -eq 10 ]]; then                                                             
        rm -f /tmp/heartbeatwrapper                                                         
        echo "Problem unmount /export"                                                      
        exit 1                                                                              
    fi                                                                                      
fi                                                                                          


#now, it's important that we shut down nfs. it can't write to /export anymore, so that's fine.
#if we leave it running at this point, then drbd will screwup when trying to go to secondary.  
#See contradictory comment above for why this doesn't matter anymore.  These comments are left in
#entirely to remind me of the pain this caused me to resolve.  A bit like why churches have Jesus
#nailed onto a cross instead of chilling in a hammock.                                           

pidof nfsd | xargs kill -9 >/dev/null 2>&1

sleep 1                                   

if [[ -n $(ps aux | grep nfs | grep -v grep) ]]; then
    echo "nfs still running, trying to kill again"   
    pidof nfsd | xargs kill -9 >/dev/null 2>&1       
fi                                                   

sleep 1

/etc/init.d/nfs-kernel-server stop #>/dev/null 2>&1

sleep 1

#next we need to tear down drbd - easy with the heartbeat scripts
#it takes input as resourcename start|stop|status                
#First, we'll check to see if it's stopped                       

/etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1
if [[ $? -eq 2 ]]; then                                    
    echo "resource is already stopped for some reason..."  
else                                                       
    for ((i=1; i <= 10; i++)); do                          
        /etc/ha.d/resource.d/drbddisk export stop >/dev/null 2>&1
        if [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Secondary" ]] || [[ $(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2) == "Secondary/Unknown" ]]; then                                                                                                                             
            echo "Successfully stopped DRBD"                                                                                                             
            break                                                                                                                                        
        else                                                                                                                                             
            echo "Failed to stop drbd for some reason"                                                                                                   
            cat /proc/drbd                                                                                                                               
            if [[ $i -eq 10 ]]; then                                                                                                                     
                    exit 50                                                                                                                              
            fi                                                                                                                                           
        fi                                                                                                                                               
    done                                                                                                                                                 
fi                                                                                                                                                       

rm -f /tmp/heartbeatwrapper                                                                                                                              
exit 0                                                                                                                                                   


elif [[ x$1 == "xstart" ]]; then

#start up drbd first
/etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1
if [[ $? -ne 0 ]]; then                                   
    echo "Something seems to have broken. Let's check possibilities..."
    testvar=$(egrep -o "st:[A-Za-z/]*" /proc/drbd | cut -d: -f2)       
    if [[ $testvar == "Primary/Unknown" ]] || [[ $testvar == "Primary/Secondary" ]]
    then                                                                           
        echo "All is fine, we are already the Primary for some reason"             
    elif [[ $testvar == "Secondary/Unknown" ]] || [[ $testvar == "Secondary/Secondary" ]]
    then                                                                                 
        echo "Trying to assume Primary again"                                            
        /etc/ha.d/resource.d/drbddisk export start >/dev/null 2>&1                       
        if [[ $? -ne 0 ]]; then                                                          
            echo "I give up, something's seriously broken here, and I can't help you to fix it."
            rm -f /tmp/heartbeatwrapper                                                         
            exit 127                                                                            
        fi                                                                                      
    fi                                                                                          
fi                                                                                              

sleep 1                                                                                         

#now we remount our partitions                                                                  

for ((test=1; test <= 10; test++)); do                                                          
    mount /dev/drbd1 /export >/tmp/mountoutput                                                  
    if [[ -n $(grep -o export /etc/mtab) ]]; then                                               
        break                                                                                   
    fi                                                                                          
done                                                                                            

if [[ $test -eq 10 ]]; then                                                                     
    rm -f /tmp/heartbeatwrapper                                                                 
    exit 125                                                                                    
fi                                                                                              


#I'm really unsure at this point of the side-effects of not having rpc_pipefs mounted.          
#The issue here, is that it cannot be mounted without nfs running, and we don't really want to start
#nfs up at this point, lest it ruin everything.                                                     
#For now, I'm leaving mine unmounted, it doesn't seem to cause any problems.                        

#Now we start up nfs.

/etc/init.d/nfs-kernel-server start >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "There's not really that much that I can do to debug nfs issues."
    echo "probably your configuration is broken.  I'm terminating here."
    rm -f /tmp/heartbeatwrapper
    exit 129
fi

#And that's it, done.

rm -f /tmp/heartbeatwrapper
exit 0


elif [[ "x$1" == "xstatus" ]]; then

#Lets check to make sure nothing is broken.

#DRBD first
/etc/ha.d/resource.d/drbddisk export status >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

#mounted?
grep -q drbd /etc/mtab >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

#nfs running?
/etc/init.d/nfs-kernel-server status >/dev/null 2>&1
if [[ $? -ne 0 ]]; then
    echo "stopped"
    rm -f /tmp/heartbeatwrapper
    exit 3
fi

echo "running"
rm -f /tmp/heartbeatwrapper
exit 0


fi

上記のすべてが完了したら、/etc/exports を構成するだけです。

/export 172.20.1.0/255.255.255.0(rw,sync,fsid=1,no_root_squash)

次に、両方のマシンでハートビートを起動し、そのうちの 1 つで hb_takeover を発行するだけです。テイクオーバーを発行したものがプライマリであることを確認することで、それが機能していることをテストできます - /proc/drbd をチェックし、デバイスが正しくマウントされていること、および nfs にアクセスできることを確認します。

--

頑張ってください。ゼロから設定することは、私にとって非常に苦痛な経験でした。

score 3 · Accepted Answer

最近では 2 TB が 1 台のマシンに収まるため、シンプルなものから複雑なものまで、さまざまなオプションがあります。これらはすべて Linux サーバーを前提としています。

2 台のマシンをセットアップし、メインのマシンからバックアップに定期的に rsync を実行することで、貧弱な HA を取得できます。
DRBDを使用して、ブロックレベルで相互にミラーリングできます。これには、将来的に拡張するのがやや難しいという欠点があります。
将来の拡張性のために、代わりにOCFS2を使用してディスクをクラスタ化できます。

商用ソリューションもたくさんありますが、最近のほとんどのソリューションでは 2 TB は少し小さいです。

アプリケーションについてはまだ言及していませんが、ホットフェイルオーバーが必要なく、ディスクが 1 つまたは 2 つ失われても耐えられるものが本当に必要な場合は、RAID-5、少なくとも 4 つのドライブをサポートする NAS を見つけてください。そしてホットスワップすれば、準備完了です。

score 1 · Accepted Answer

NASストレージをお勧めします。(ネットワーク接続ストレージ)。

HPには、選択できるいくつかの優れたものがあります。

http://h18006.www1.hp.com/storage/aiostorage.html

クラスタ化されたバージョン:

http://h18006.www1.hp.com/storage/software/clusteredfs/index.html?jumpid=reg_R1002_USEN

score 0 · Accepted Answer

Amazon Simple Storage Service (Amazon S3) をご覧ください。

http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA

-- これは興味深いかもしれません。高可用性

AWS のお客様各位:

多くのお客様から、現在開発中の機能やサービスについて事前にお知らせし、その機能をアプリケーションと統合する方法をより適切に計画できるようにするよう、私たちに依頼してきました。そのために、ここ AWS で開発中の新しいサービスであるコンテンツ配信サービスについて、いくつかの初期の詳細を共有できることを嬉しく思います。

この新しいサービスは、コンテンツをエンドユーザーに配信するための高性能な方法を提供し、顧客がオブジェクトにアクセスする際の待ち時間を短縮し、データ転送速度を高速化します。最初のリリースは、一般に公開されている人気のあるコンテンツを HTTP 接続経由で配信する必要がある開発者や企業を支援します。私たちの目標は、次のようなコンテンツ配信サービスを作成することです。

開発者や企業が簡単に始められるようにします。最低料金やコミットメントはありません。実際に使用した分だけお支払いいただきます。シンプルで使いやすい - 単一のシンプルな API 呼び出しだけで、コンテンツの配信を開始できます。Amazon S3 とシームレスに連携 - これにより、コンテンツ配信サービスを使いやすくしながら、ファイルの元の最終バージョン用の耐久性のあるストレージが提供されます。グローバルなプレゼンス - 3 つの大陸にあるエッジロケーションのグローバルネットワークを使用して、最も適切な場所からコンテンツを配信します。

まず、オブジェクトの元のバージョンを Amazon S3 に保存し、公開されていることを確認します。次に、簡単な API 呼び出しを行って、バケットを新しいコンテンツ配信サービスに登録します。この API 呼び出しは、Web ページまたはアプリケーションに含める新しいドメイン名を返します。クライアントがこのドメイン名を使用してオブジェクトを要求すると、コンテンツの高性能配信のために最も近いエッジロケーションに自動的にルーティングされます。それはとても簡単です。

私たちは現在、少数のプライベートベータ版の顧客グループと協力しており、年末までにこのサービスを広く利用できるようにする予定です。開始時に通知を受け取りたい場合は、ここをクリックしてお知らせください。

心から、

アマゾンウェブサービスチーム

score 0 · Accepted Answer

ミラーファイルシステムを見ることができます。ファイルシステムレベルでファイルの複製を行います。プライマリシステムとバックアップシステムの両方にある同じファイルは、ライブファイルです。

http://www.linux-ha.org/RelatedTechnologies/Filesystems

score 0 · Accepted Answer

あなたの質問の本文から、あなたはビジネスユーザーですか？Silicon Mechanics から 6TB RAID 5 ユニットを購入し、NAS を接続して、エンジニアがサーバーに NFS をインストールしました。別の大容量 NAS への rsync を介して実行されるバックアップ。

score 0 · Accepted Answer

「エンタープライズ」ソリューションと「ホーム」ソリューションのどちらをお探しですか? 2TB は企業にとっては非常に小さく、ホームユーザー (特に 2 台のサーバー) にとってはハイエンドに少しあるため、あなたの質問からはわかりにくいです。トレードオフについて話し合うことができるように、必要性を明確にしていただけますか?

score 0 · Accepted Answer

あなたの最善の策は、生計を立てるためにこの種のことをしている専門家と協力することです. 彼らは実際に私たちのオフィスの複合施設にいます...私がリードしていた同様のプロジェクトで彼らと一緒に仕事をする機会がありました。

http://www.deltasquare.com/About

score 0 · Accepted Answer

F5 サイトにアクセスして、http: //www.f5.com/solutions/virtualization/file/ を確認することをお勧めします。

score 0 · Accepted Answer

これには2つの方法があります。1 つ目は、Dell または HP から SAN または NAS を購入し、問題にお金を投じることです。最新のストレージハードウェアを使用すると、これらすべてを簡単に実行できるため、専門知識をより重要な問題に費やすことができます。

独自のロールを作成したい場合は、DRBD で Linux を使用する方法を検討してください。

http://www.drbd.org/

DRBD を使用すると、ネットワーク化されたブロックデバイスを作成できます。2 台のディスクではなく、2 台のサーバーにまたがる RAID 1 を考えてみてください。DRBD の展開は通常、1 つのシステムが停止した場合のフェールオーバーに Heartbeat を使用して行われます。

負荷分散についてはよくわかりませんが、LVS を使用して DRBD ホスト間で負荷分散できるかどうかを調査して確認してください。

http://www.linuxvirtualserver.org/

結論として、繰り返しになりますが、NAS の費用を支払うだけで、長期的にはおそらく多くの時間を節約できるでしょう。

storage - 高可用性ストレージ

10 に答える 10

!/bin/bash

heartbeat fails hard.

so this is a wrapper

to get around that stupidity

I'm just wrapping the heartbeat scripts, except for in the case of umount

as they work, mostly

NFS init script isn't LSB compatible, exit codes are 0 no matter what happens.

Thanks guys, you really make my day with this bullshit.

Because of the above, we just have to hope that nfs actually catches the signal

to exit, and manages to shut down its connections.

If it doesn't, we'll kill it later, then term any other nfs stuff afterwards.

I found this to be an interesting insight into just how badly NFS is written.

Related

Reference