3

We have

all aiming towards one common goal - making data management as scalable as possible.

By scalability what I understand is that the cost of the usage should not go up drastically when the size of data increases.

RDBMS's are slow when the amount of data is large as the number of indirections invariable increases leading to more IO's.

alt text

How do these custom scalable friendly data management systems solve the problem?

This is a figure from this document explaining Google BigTable:

alt text

Looks the same to me. How is the ultra-scalability achieved?

4

4 に答える 4

2

Bigtableについてのあなたの質問に具体的に言えば、違いは上の図の階層がすべてあるということです。各Bigtableタブレットサーバーは、一連のタブレットを担当します(テーブルからの連続する行の範囲)。行範囲からタブレットへのマッピングはメタデータテーブルで維持され、タブレットからタブレットサーバーへのマッピングはBigtableマスターのメモリで維持されます。行または行の範囲を検索するには、メタデータエントリ(ほぼ確実にそれをホストするサーバーのメモリ内にあります)を検索し、それを使用して、それを担当するサーバーの実際の行を検索する必要があります。 1つだけ、または少数のディスクがシークします。

一言で言えば、これがうまくスケーリングする理由は、より多くのハードウェアを投入できるためです。十分なリソースがあれば、メタデータは常にメモリ内にあるため、データのためだけにディスクに移動する必要はありません(常にそのために!)。

于 2010-08-16T10:43:05.477 に答える
2

The "traditional" SQL DBMS market really means a very small number of products, which have traditionally targeted business applications in a corporate setting. Massive shared-nothing scalability has not historically been a priority for those products or their customers. So it is natural that alternative products have emerged to support internet scale database applications.

This has nothing to do with the fact that these new products are not "Relational" DBMSs. The relational model can scale just as well as any other model. Arguably the relational model suits these types of massively scalable applications better than say, network (graph based) models. It's just that the SQL language has a lot of disadvantages and no-one has yet come up with suitable relational NOSQL (non-SQL) alternatives.

于 2010-08-16T05:47:00.177 に答える
0

It's about using cheap comodity hardware to build a network/grid/cloud and spread the data and load (for example using map/reduce).

RDBMS databases seem to me like software being (originaly) designed to run on one supercomputer. You can use various hard drive arrays, DB clusters, but still..

The amount of data increased so there's one more reason to design new data storages with this in mind - scalability, high availability, terabytes of data.

Another thing - if you build a grid/cloud from cheap servers, it's fault tolerant because you store all data at three (?) different locations and at the same time it's cheap.

Back to your pictures - the first one is from one computer (typically), the second one from a network of computers.

于 2010-08-15T20:27:33.900 に答える
0

One theoretical answer on scalability is at http://queue.acm.org/detail.cfm?id=1394128 - the ACID guarantees are expensive. See http://database.cs.brown.edu/papers/stonebraker-cacm2010.pdf for a counter-argument.

In fact just surviving power failures is expensive. Years ago now I compared MySQL against Oracle. MySQL was almost unbelieveably faster than Oracle, but we couldn't use it. MySQL of those days was built on top of Berkeley DB, which was miles faster than Oracle's full blown log-based database, but if the power went off while Berkely DB based MySQL was running, it was a manual process to get the database consistent again when the power went back on, and you'ld probably lose recent updates for good.

于 2010-08-16T05:12:05.720 に答える