sql - SQL で VARCHAR ではなく CHAR を選択するユースケースは何ですか?

Question

すべての値が固定幅の場合、CHAR が推奨されることを認識しています。しかし、だから何？安全のために、すべてのテキストフィールドに VARCHAR を選択しないでください。

score 394 · Accepted Answer

一般的なルールは、すべての行がほぼ同じ長さになる場合にCHARを選択することです。長さが大幅に異なる場合は、 VARCHAR (またはNVARCHAR ) を選択します。すべての行が同じ長さであるため、CHAR も少し高速になる可能性があります。

DB の実装によって異なりますが、一般的に VARCHAR (またはNVARCHAR ) は、実際のデータに加えて (長さまたは終了のために) 1 バイトまたは 2 バイト多くのストレージを使用します。したがって、(1 バイト文字セットを使用していると仮定して) 「FooBar」という単語を格納します。

CHAR(6) = 6 バイト(オーバーヘッドなし)
VARCHAR(100) = 8 バイト(2 バイトのオーバーヘッド)
CHAR(10) = 10 バイト(4 バイトの無駄)

要するに、 CHAR は、比較的同じ長さ (2 文字の長さの差以内) のデータに対して、より高速でスペース効率が高くなる可能性があるということです。

注: Microsoft SQL には、VARCHAR に対して 2 バイトのオーバーヘッドがあります。これは DB ごとに異なる場合がありますが、一般的に、VARCHAR で長さまたは EOL を示すために少なくとも 1 バイトのオーバーヘッドが必要です。

コメントでGavenが指摘したように、マルチバイト文字セットになると状況が変わり、VARCHARがより良い選択になるケースです。

VARCHARの宣言された長さに関する注意: 実際のコンテンツの長さが格納されるため、未使用の長さを無駄にすることはありません。したがって、VARCHAR(6)、VARCHAR(100)、またはVARCHAR(MAX)に 6 文字を格納すると、同じ容量のストレージが使用されます。VARCHAR(MAX)を使用する場合の違いについて詳しくは、こちらをご覧ください。格納する量を制限するには、VARCHAR で最大サイズを宣言します。

コメントの中で、 AlwaysLearningは、 Microsoft Transact-SQL のドキュメントは反対のことを言っているようだと指摘しました。それはエラーであるか、少なくともドキュメントが不明確であることをお勧めします。

score 69 · Accepted Answer

あなたが私と一緒に働いていて、あなたがOracleと一緒に働いているなら、私はおそらくあなたvarcharをほぼすべての状況で使用させるでしょう. charよりも少ない処理能力を使用するという仮定はvarchar、今のところは正しいかもしれません...しかし、データベースエンジンは時間の経過とともに良くなり、この種の一般的なルールは将来の「神話」を作ります.

別のこと: 誰かがvarchar. 優れたコード (データベースへの呼び出しが少ない) と効率的な SQL (インデックスがどのように機能するか、オプティマイザーがどのように決定を下すか、通常existsよりも速い理由など) を作成する時間をより有効に活用できます。in

最終的な考え: の使用に関するあらゆる種類の問題を見てきCHARました。人々は '' を探すべきときに '' を探しています。、または末尾の空白をトリミングしない人、または Powerbuilder のバグにより、Oracle プロシージャから返される値に最大 2000 個の空白が追加されます。

score 32 · Accepted Answer

パフォーマンス上の利点に加えてCHAR、すべての値が同じ長さであることを示すために使用できます (例: 米国の州の省略形の列)。

score 19 · Accepted Answer

Char は少し高速なので、特定の長さになることがわかっている列がある場合は、char を使用してください。たとえば、性別には (M)ale/(F)emale/(U)nknown を格納し、米国の州には 2 文字を格納します。

score 18 · Accepted Answer

Does NChar or Char perform better that their var alternatives?

Great question. The simple answer is yes in certain situations. Let's see if this can be explained.

Obviously we all know that if I create a table with a column of varchar(255) (let's call this column myColumn) and insert a million rows but put only a few characters into myColumn for each row, the table will be much smaller (overall number of data pages needed by the storage engine) than if I had created myColumn as char(255). Anytime I do an operation (DML) on that table and request alot of rows, it will be faster when myColumn is varchar because I don't have to move around all those "extra" spaces at the end. Move, as in when SQL Server does internal sorts such as during a distinct or union operation, or if it chooses a merge during it's query plan, etc. Move could also mean the time it takes to get the data from the server to my local pc or to another computer or wherever it is going to be consumed.

But there is some overhead in using varchar. SQL Server has to use a two byte indicator (overhead) to, on each row, to know how many bytes that particular row's myColumn has in it. It's not the extra 2 bytes that presents the problem, it's the having to "decode" the length of the data in myColumn on every row.

In my experiences it makes the most sense to use char instead of varchar on columns that will be joined to in queries. For example the primary key of a table, or some other column that will be indexed. CustomerNumber on a demographic table, or CodeID on a decode table, or perhaps OrderNumber on an order table. By using char, the query engine can more quickly perform the join because it can do straight pointer arithmetic (deterministically) rather than having to move it's pointers a variable amount of bytes as it reads the pages. I know I might have lost you on that last sentence. Joins in SQL Server are based around the idea of "predicates." A predicate is a condition. For example myColumn = 1, or OrderNumber < 500.

So if SQL Server is performing a DML statement, and the predicates, or "keys" being joined on are a fixed length (char), the query engine doesn't have to do as much work to match rows from one table to rows from another table. It won't have to find out how long the data is in the row and then walk down the string to find the end. All that takes time.

Now bear in mind this can easily be poorly implemented. I have seen char used for primary key fields in online systems. The width must be kept small i.e. char(15) or something reasonable. And it works best in online systems because you are usually only retrieving or upserting a small number of rows, so having to "rtrim" those trailing spaces you'll get in the result set is a trivial task as opposed to having to join millions of rows from one table to millions of rows on another table.

Another reason CHAR makes sense over varchar on online systems is that it reduces page splits. By using char, you are essentially "reserving" (and wasting) that space so if a user comes along later and puts more data into that column SQL has already allocated space for it and in it goes.

Another reason to use CHAR is similar to the second reason. If a programmer or user does a "batch" update to millions of rows, adding some sentence to a note field for example, you won't get a call from your DBA in the middle of the night wondering why their drives are full. In other words, it leads to more predictable growth of the size of a database.

So those are 3 ways an online (OLTP) system can benefit from char over varchar. I hardly ever use char in a warehouse/analysis/OLAP scenario because usually you have SO much data that all those char columns can add up to lots of wasted space.

Keep in mind that char can make your database much larger but most backup tools have data compression so your backups tend to be about the same size as if you had used varchar. For example LiteSpeed or RedGate SQL Backup.

Another use is in views created for exporting data to a fixed width file. Let's say I have to export some data to a flat file to be read by a mainframe. It is fixed width (not delimited). I like to store the data in my "staging" table as varchar (thus consuming less space on my database) and then use a view to CAST everything to it's char equivalent, with the length corresponding to the width of the fixed width for that column. For example:

create table tblStagingTable (
pkID BIGINT (IDENTITY,1,1),
CustomerFirstName varchar(30),
CustomerLastName varchar(30),
CustomerCityStateZip varchar(100),
CustomerCurrentBalance money )

insert into tblStagingTable
(CustomerFirstName,CustomerLastName, CustomerCityStateZip) ('Joe','Blow','123 Main St Washington, MD 12345', 123.45)

create view vwStagingTable AS
SELECT CustomerFirstName = CAST(CustomerFirstName as CHAR(30)),
CustomerLastName = CAST(CustomerLastName as CHAR(30)),
CustomerCityStateZip = CAST(CustomerCityStateZip as CHAR(100)),
CustomerCurrentBalance = CAST(CAST(CustomerCurrentBalance as NUMERIC(9,2)) AS CHAR(10))

SELECT * from vwStagingTable

This is cool because internally my data takes up less space because it's using varchar. But when I use DTS or SSIS or even just a cut and paste from SSMS to Notepad, I can use the view and get the right number of trailing spaces. In DTS we used to have a feature called, damn I forget I think it was called "suggest columns" or something. In SSIS you can't do that anymore, you have to tediously define the flat file connection manager. But since you have your view setup, SSIS can know the width of each column and it can save alot of time when building your data flow tasks.

So bottom line... use varchar. There are a very small number of reasons to use char and it's only for performance reasons. If you have a system with hundrends of millions of rows you will see a noticeable difference if the predicates are deterministic (char) but for most systems using char is simply wasting space.

Hope that helps. Jeff

score 9 · Accepted Answer

パフォーマンス上の利点はありますが、言及されていないものがあります: 行の移行です。char では、スペース全体を事前に予約します。つまり、char(1000) があり、10 文字を格納すると、1000 文字すべてのスペースを使い果たすとしましょう。varchar2(1000) では、10 文字しか使用できません。問題は、データを変更するときに発生します。900 文字を含むように列を更新するとします。varchar を展開するスペースが現在のブロックで利用できない可能性があります。その場合、DB エンジンは行を別のブロックに移行し、元のブロックに新しいブロックの新しい行へのポインターを作成する必要があります。このデータを読み取るために、DB エンジンは 2 つのブロックを読み取る必要があります。
varchar または char の方が優れていると明確に言うことはできません。時間のトレードオフと、特にデータが大きくなる可能性が高い場合は、データが更新されるかどうかを考慮する余地があります。

score 8 · Accepted Answer

初期のパフォーマンス最適化とベストプラクティスタイプのルールの使用には違いがあります。常に固定長フィールドを持つ新しいテーブルを作成する場合は、CHAR を使用するのが理にかなっています。その場合は CHAR を使用する必要があります。これは初期の最適化ではなく、経験則 (またはベストプラクティス) の実装です。

つまり、2 文字の状態フィールドがある場合は、CHAR(2) を使用します。実際の州名を含むフィールドがある場合は、VARCHAR を使用します。

score 8 · Accepted Answer

列に米国の州コードのような固定値が格納されていない限り、varchar を選択します。これは常に 2 文字の長さで、有効な米国の州コードのリストは頻繁に変更されません:)。

それ以外の場合は、ハッシュ化されたパスワード (固定長) を格納する場合でも、varchar を選択します。

理由 -- char 型の列は常にスペースで埋められます。これにより、列my_columnが比較内で値 'ABC' を持つ char(5) として定義されます。

my_column = 'ABC' -- my_column stores 'ABC  ' value which is different then 'ABC'

間違い。

この機能により、開発中に多くの厄介なバグが発生し、テストが難しくなる可能性があります。

score 6 · Accepted Answer

そのフィールドのすべてのデータ値が同じ長さの場合、CHAR は VARCHAR よりも少ないストレージスペースを占有します。おそらく2009年には、VARCHARをCHARに変換した場合、800GBのデータベースはすべての意図と目的で810GBと同じですが、短い文字列（1文字または2文字）の場合、CHARは依然として業界の「ベストプラクティス」です。

ほとんどのデータベースが提供するさまざまなデータ型 (bit、tiny、int、bigint) を整数だけで見てみると、いずれかを選択する理由があります。毎回単純に bigint を選択することは、実際には、フィールドの目的と使用法について少し無知です。フィールドが単純に人の年齢を表す場合、bigint はやり過ぎです。必ずしも「間違っている」わけではありませんが、効率的ではありません。

しかし、これは興味深い議論であり、データベースが時間の経過とともに改善されるにつれて、CHAR と VARCHAR の関連性が低下すると主張される可能性があります。

score 4 · Accepted Answer

Jim McKeeth のコメントを支持します。

また、テーブルに CHAR 列しかない場合は、インデックス作成と全テーブルスキャンが高速になります。基本的に、オプティマイザーは、各レコードに CHAR 列しかない場合、各レコードの大きさを予測できますが、すべての VARCHAR 列のサイズ値を確認する必要があります。

また、VARCHAR カラムを以前のコンテンツよりも大きなサイズに更新すると、データベースにインデックスの再構築を強制する可能性があります (データベースにディスク上のレコードを物理的に移動させたため)。決して起こらない CHAR 列を使用している間。

ただし、テーブルが巨大でない限り、パフォーマンスへの影響はおそらく気にしないでしょう。

ジクストラの賢明な言葉を思い出してください。初期のパフォーマンス最適化は諸悪の根源です。

score 4 · Accepted Answer

多くの人が、値の正確な長さがわかっている場合、CHAR を使用するといくつかの利点があると指摘しています。しかし、現在、米国の州を CHAR(2) として保存することは素晴らしいことですが、「オーストラリアへの最初の販売を行ったばかりです」というメッセージを営業から受け取ると、あなたは苦痛の世界にいることになります。私は常に、将来のイベントをカバーするために「正確な」推測をするのではなく、フィールドに必要な時間を過大評価するために送信します。VARCHAR を使用すると、この領域でより柔軟に対応できます。

score 3 · Accepted Answer

あなたの場合、Varchar を選択しない理由はおそらくないと思います。これにより柔軟性が得られ、多くの回答者が言及しているように、現在のパフォーマンスは、非常に特殊な状況を除いて (Google DBA とは対照的に) 私たち人間が違いに気付かないほどのものです。

DB タイプに関して注目に値する興味深い点は、sqlite (非常に優れたパフォーマンスを備えた一般的なミニデータベース) がすべてを文字列としてデータベースに配置し、オンザフライでタイプすることです。

私は常に VarChar を使用しており、通常は必要以上に大きくします。例えば。あなたが言うように、なぜ安全ではないのか。

score 2 · Accepted Answer

列値に実際に必要なサイズを計算し、Varchar にスペースを割り当てるには、多少の処理オーバーヘッドがあります。そのため、値が常にどのくらいの長さになるかが確実にわかっている場合は、Char を使用してヒットを回避することをお勧めします。

score 2 · Accepted Answer

断片化。Char はスペースを予約しますが、VarChar は予約しません。varchar への更新に対応するために、ページ分割が必要になる場合があります。

score 2 · Accepted Answer

これは、従来のスペースとパフォーマンスのトレードオフです。

MS SQL 2005 では、Varchar (または 1 文字あたり 2 バイトを必要とする言語、つまり中国語の場合は NVarchar) は可変長です。ハードディスクに書き込まれた後に行に追加すると、データが元の行と連続しない場所に配置され、データファイルが断片化されます。これはパフォーマンスに影響します。

したがって、スペースが問題にならない場合は Char の方がパフォーマンスに優れていますが、データベースのサイズを抑えたい場合は varchar の方が優れています。

score 1 · Accepted Answer

varchar 値を使用する場合、SQL Server はその列に関する情報を格納するために行ごとに追加の 2 バイトを必要としますが、char を使用する場合は必要ありません。

score 0 · Accepted Answer

一部の SQL データベースでは、オフセットを最適化するために VARCHAR が最大サイズまで埋められます。これは、テーブル全体のスキャンとインデックスを高速化するためです。

このため、VARCHAR(200) を使用しても、CHAR(200) と比較してスペースを節約することはできません。

score 0 · Accepted Answer

CHAR (NCHAR) と VARCHAR (NVARCHAR) を使用すると、データベースサーバーがデータを格納する方法が異なります。最初のものは末尾の空白を導入します。SQL SERVER 関数で LIKE 演算子と一緒に使用すると問題が発生しました。そのため、常に VARCHAR (NVARCHAR) を使用して安全にする必要があります。

たとえば、テーブルTEST(ID INT, Status CHAR(1))があり、次のような特定の値を持つすべてのレコードを一覧表示する関数を作成するとします。

CREATE FUNCTION List(@Status AS CHAR(1) = '')
RETURNS TABLE
AS
RETURN
SELECT * FROM TEST
WHERE Status LIKE '%' + @Status '%'

この関数では、デフォルトのパラメーターを指定すると、関数がすべての行を返すことを期待していますが、実際にはそうではありません。@Status データ型を VARCHAR に変更すると、問題が修正されます。

sql - SQL で VARCHAR ではなく CHAR を選択するユースケースは何ですか?

19 に答える 19

Related

Reference