1

誰かがこの形式を認識していますか (一番下の貼り付けを参照)? Répertoire de vettes-matière (RVM) からのものです。次の 2 つのどちらでもありません。

https://github.com/LibreCat/Catmandu-MARC/issues/88として投稿された Perl でプログラミングできます。

XS::JSON だけでハックできますが、この奇妙なアクセント エンコーディングを処理する方法がわかりません (325 からいくつかのサンプル行を示します)。

{grave}e
{ring}Z
{ringb}h
{ringb}s
{rlig}a
{rlig}A

奇妙なMARC JSONは次のとおりです。

{
"rows" : [
{
    "RecordNumber" : "1",
    "Tag" : "LDR",
    "Indicators" : "",
    "Content" : "00533nz   2200205n  4500"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "001",
    "Indicators" : "\"  \"",
    "Content" : "201-0000001"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "005",
    "Indicators" : "\"  \"",
    "Content" : "20121025110000.0"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "008",
    "Indicators" : "\"  \"",
    "Content" : "790704\\nfanvnnbabn\\\\\\\\\\\\\\\\\\\\\\b\\ana\\\\\\\\\\\\"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "016",
    "Indicators" : "\\\\",
    "Content" : "$a0509B3366"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "035",
    "Indicators" : "\\\\",
    "Content" : "$a(ISM)8013850"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "035",
    "Indicators" : "9\\",
    "Content" : "$a201-0000001"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "040",
    "Indicators" : "\\\\",
    "Content" : "$aCaQQLa$bfre"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "150",
    "Indicators" : "\\\\",
    "Content" : "$aAlg{grave}ebres de Von Neumann"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "450",
    "Indicators" : "\\\\",
    "Content" : "$wnne$aVon Neumann, Alg{grave}ebres de"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "450",
    "Indicators" : "\\\\",
    "Content" : "$aW*-alg{grave}ebres"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "550",
    "Indicators" : "\\\\",
    "Content" : "$wg$aC*-alg{grave}ebres"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "550",
    "Indicators" : "\\\\",
    "Content" : "$wg$aEspace de Hilbert"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "697",
    "Indicators" : "\\\\",
    "Content" : "$amm."
}
,
{
    "RecordNumber" : "1",
    "Tag" : "750",
    "Indicators" : "\\7",
    "Content" : "$aVon Neumann, Alg{grave}ebres de$2ram"
}
,
{
    "RecordNumber" : "1",
    "Tag" : "750",
    "Indicators" : "\\0",
    "Content" : "$aVon Neumann algebras"
}
]
}

追加: このアクセント エンコーディングは MARCmkr からのものです。私は以下を使用しました:

use MARC::File::MARCMaker; # https://metacpan.org/pod/MARC::File::MARCMaker
# for some reason can't be found by module name, so use:
# cpanm http://www.cpan.org/authors/id/E/EI/EIJABB/MARC-File-MARCMaker-0.05.tar.gz
my $marc_charset = MARC::File::MARCMaker::usmarc_default();
$content = MARC::File::MARCMaker::_maker2char ($content, $marc_charset);

しかし、たとえばこのテキストhttps://github.com/gmcharlt/marc-perl/blob/e8e0ecc92946d6dcb3c2270706041a30eff0f68d/marc-marcmaker/t/marcmaker.t#L92でテストすると、アクセント/合字が XML エンティティに変換されます。翻訳されたテキストをブラウザで開いてみました。一部のエンティティは解釈されず、次の文字にアクセントが付けられていません。したがって、翻訳を完了するには、「XML から Unicode へ」モジュールを使用する必要があると思います。

This a test of diacritics like the uppercase Polish L in
Ł´od´z, the uppercase Scandinavia O in &Ostrok;st, the
uppercase D with crossbar in Đuro, the uppercase Icelandic
thorn in Þann, the uppercase digraph AE in Ægir, the
uppercase digraph OE in Œuvres, the soft sign in
rech&softsign;, the middle dot in col·lecci´o, the musical
flat in F♭, the patent mark in Frizbee®, the plus or minus
sign in ±54%, the uppercase O-hook in B&Ohorn;, the
uppercase U-hook in X&Uhorn;A, the alif in
mas&mlrhring;alah, the ayn in &mllhring;arab, the lowercase
Polish l in Włocław, the lowercase Scandinavian o in
K&ostrok;benhavn, the lowercase d with crossbar in đavola,
the lowercase Icelandic thorn in þann, the lowercase digraph
ae in være, the lowercase digraph oe in cœur, the lowercase
hardsign in s&hardsign;ezd, the Turkish dotless i in masalı,
the British pound sign in £5.95, the lowercase eth in
verður, the lowercase o-hook (with pseudo question mark) in
S&hooka;&ohorn;, the lowercase u-hook in T&uhorn; D&uhorn;c,
the pseudo question mark in c&hooka;ui, the grave accent in
tr`es, the acute accent in d´esir´ee, the circumflex in
cˆote, the tilde in ma˜nana, the macron in T¯okyo, the breve
in russki˘i, the dot above in ˙zaba, the dieresis (umlaut)
in L¨owenbr¨au, the caron (hachek) in ˇcrny, the circle
above (angstrom) in ˚arbok, the ligature first and second
halves in d&llig;i&rlig;ad&llig;i&rlig;a, the high comma off
center in rozdel&rcommaa;ovac, the double acute in
id˝oszaki, the candrabindu (breve with dot above) in
Ali&candra;iev, the cedilla in ¸ca va comme ¸ca, the right
hook in viet˛a, the dot below in te&dotb;da, the double dot
below in &under;k&under;hu&dbldotb;tbah, the circle below in
Sa&dotb;msk&ringb;rta, the double underscore in
&dblunder;Ghulam, the left hook in Lech Wał&commab;esa, the
right cedilla (comma below) in khŗong, the upadhmaniya (half
circle below) in &breveb;humantuˇs, double tilde, first and
second halves in &ldbltil;n&rdbltil;galan, high comma
(centered) in g&commaa;eotermika.
4

1 に答える 1