[更新]
問題
私は2つのデータベースを持っています:
1:
1 Name: D-Tagatose 1,6-bisphosphate
2 Name: 1-Phosphatidyl-D-myo-inositol;: 1-Phosphatidyl-1D-myo-inositol;: 1-Phosphatidyl- myo-inositol;: Phosphatidyl-1D-myo-inositol;: (3-Phosphatidyl)-1-D-inositol;: 1,2-Diacyl-sn-glycero-3-phosphoinositol;: Phosphatidylinositol
3 Name: Androstenedione;: Androst-4-ene-3,17-dione;: 4-Androstene-3,17-dione
4 Name: Spermine;: N,N'-Bis(3-aminopropyl)-1,4-butanediamine
5 Name: H+;: Hydron
2:
> <NAME> Benzaldehyde, 4-[(trimethylsilyl)oxy]- > <SYNONYMS> Benzaldehyde, p-(trimethylsiloxy)-
> <NAME> Benzeneacetic acid, methyl ester > <SYNONYMS> q qer
> <NAME> Cyclopropaneoctanoic acid, 2-[[2-[(2-ethylcyclopropyl)methyl]cyclopropyl]methyl]-, methyl ester > <SYNONYMS> Methyl 8-[2-((2-[(2-ethylcyclopropyl)methyl]cyclopropyl)methyl)cyclopropyl]octanoate #
> <NAME> Mevalonic lactone, trimethylsilyl deriv. > <SYNONYMS> Mevalonic lactone, trimethylsilyl
> <NAME> Benzeneacetic acid, phenylmethyl ester > <SYNONYMS> Acetic acid, phenyl-, benzyl ester
望ましい出力:
データベース 2 の名前またはシノニムをデータベース 1 の名前と一致させます。私たちは化合物について話しているので、化合物の名前にわずかな違いが生じる可能性があります. そのため、リンクされたオンライン データベースもマッチングに使用しました。
テスト用の入力:
リンク先のエクセルファイルをご覧ください。データ
私が試したことは?
名前のみの照合 (db 1 の名前から "Name " 文字列を差し引く必要があります)
部分的な名前の一致 -> 明らかに、化学名の一致では最良のアイデアではありません。
以下のデータベースを利用したマッチング)
小さな R 入力:
入力 1
structure(c("> <NAME>", "> <NAME>", "> <NAME>", "> <NAME>",
"> <NAME>", "> <NAME>", "> <NAME>", "> <NAME>", "> <NAME>",
"> <NAME>", "> <NAME>", "> <NAME>", "> <NAME>", "> <NAME>",
"> <NAME>", " Benzaldehyde, 4-[(trimethylsilyl)oxy]-", " Benzeneacetic acid, methyl ester",
" Cyclopropaneoctanoic acid, 2-[[2-[(2-ethylcyclopropyl)methyl]cyclopropyl]methyl]-, methyl ester",
" Mevalonic lactone, trimethylsilyl deriv.", " Benzeneacetic acid, phenylmethyl ester",
" Butanoic acid, 3,3-dimethyl-, methyl ester", " Acetic acid, (4-(trifluoromethoxy)phenyl)methyl ester",
" Phosphoramidothioic acid, O,S-dimethyl ester", " Octanoic acid, phenylmethyl ester",
" Benzenepropanoic acid, methyl ester", " 2-Propenoic acid, 3-phenyl-, methyl ester",
" Propanoic acid, 2-methyl-, phenylmethyl ester", " Acetic acid, (2,3-dichlorophenyl)methyl ester",
" L-Methionine, methyl ester", " Butanoic acid, phenylmethyl ester",
"<SYNONYMS>", "<SYNONYMS>", "<SYNONYMS>", "<SYNONYMS>", "<SYNONYMS>",
"<SYNONYMS>", "> <SYNONYMS>", "<SYNONYMS>", "<SYNONYMS>", "<SYNONYMS>",
"<SYNONYMS>", "<SYNONYMS>", "> <SYNONYMS>", "<SYNONYMS>", "<SYNONYMS>",
" Benzaldehyde, p-(trimethylsiloxy)-", " Acetic acid, phenyl-, methyl ester",
" Methyl 8-[2-((2-[(2-ethylcyclopropyl)methyl]cyclopropyl)methyl)cyclopropyl]octanoate #",
" Mevalonic lactone, trimethylsilyl", " Acetic acid, phenyl-, benzyl ester",
" Butyric acid, 3,3-dimethyl-, methyl ester", " NA", " Methamidophos",
" Octanoic acid, benzyl ester", " Hydrocinnamic acid, methyl ester",
" Cinnamic acid, methyl ester", " Isobutyric acid, benzyl ester",
" NA", " Methyl 2-amino-4-(methylsulfanyl)butanoate #", " Butyric acid, benzyl ester"
), .Dim = c(15L, 4L), .Dimnames = list(c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"),
c("NAME", NA, "NA.1", "NA.2")))
入力 2
structure(c("Name: 1-Phosphatidyl-D-myo-inositol;: 1-Phosphatidyl-1D-myo-inositol;: 1-Phosphatidyl-myo-inositol;: Phosphatidyl-1D-myo-inositol;: (3-Phosphatidyl)-1-D-inositol;: 1,2-Diacyl-sn-glycero-3-phosphoinositol;: Phosphatidylinositol",
"Name: Androstenedione;: Androst-4-ene-3,17-dione;: 4-Androstene-3,17-dione",
"Name: Spermine;: N,N'-Bis(3-aminopropyl)-1,4-butanediamine",
"Name: H+;: Hydron", "Name: 3-Iodo-L-tyrosine", "Name: 3-Methoxytyramine",
"Name: 3-Methoxy-4-hydroxyphenylacetaldehyde;: (4-Hydroxy-3-methoxyphenyl)acetaldehyde;: Homovanillin",
"Name: L-Noradrenaline;: Noradrenaline;: Norepinephrine;: Arterenol;: 4-[(1R)-2-Amino-1-hydroxyethyl]-1,2-benzenediol",
"Name: 3,4-Dihydroxymandelaldehyde;: 3,4-Dihydroxyphenylglycolaldehyde",
"Name: L-Metanephrine", "Name: L-Adrenaline;: (R)-(-)-Adrenaline;: (R)-(-)-Epinephrine;: (R)-(-)-Epirenamine;: (R)-(-)-Adnephrine;: 4-[(1R)-1-Hydroxy-2-(methylamino)ethyl]-1,2-benzenediol",
"Name: 3-Methoxy-4-hydroxyphenylglycolaldehyde", "Name: L-Normetanephrine",
"Name: L-Dopachrome;: 2-L-Carboxy-2,3-dihydroindole-5,6-quinone",
"Name: 5,6-Dihydroxyindole;: DHI"), .Dim = c(15L, 1L))