java - テキストファイルを複数のテキストファイルに解析する

Question

入力ファイルを Java で解析して複数のファイルを取得したい。入力ファイルには、何千ものタンパク質配列の fasta 形式が多数含まれており、各タンパク質配列の raw 形式 (つまり、カンマ、セミコロン、および ">"、"["、"]" などの余分な記号なし) を生成したいと考えています。

fasta 配列は ">" 記号から始まり、その後にタンパク質の説明、タンパク質の配列が続きます。

例 ► >lcl|NC_000001.10_cdsid_XP_003403591.1 [gene=LOC100652771] [protein=仮説タンパク質 LOC100652771] [protein_id=XP_003403591.1] [location=join(12190..12227,12595..12721,113403)。 ] MSESINFSHNLGQLLSPPRCVVMPGMPFPSIRSPELQKTTADLDHTLVSVPSVAESLHHPEITFLTAFCL PSFTRSRPLPDRQLHHCLALCPSFALPAGDGVCHGPGLQGSCYKGETQESVESRVLPGPRHRH

上記の形式のように、入力ファイルには数千のタンパク質配列が含まれています。特別な記号やギャップのない個々のタンパク質配列のみを含む何千もの生ファイルを生成する必要があります。

Javaでコードを開発しましたが、出力は次のとおりです。ファイルを開けません。その後にファイルが見つかりません。

私の問題を解決するのを手伝ってください。

敬具 Vijay Kumar Garg Varanasi Bharat (インド)

コードは

/*Java code to convert FASTA format to a raw format*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.io.FileInputStream;

// java package for using regular expression
public class Arrayren
{
    public static void main(String args[]) throws IOException  
    {
        String a[]=new String[1000];
        String b[][] =new String[1000][1000];
        /*open the id file*/
        try
        {
            File f = new File ("input.txt"); 
            //opening the text document containing genbank ids
            FileInputStream fis = new FileInputStream("input.txt");
            //Reading the file contents through inputstream
            BufferedInputStream bis = new BufferedInputStream(fis);
            // Writing the contents to a buffered stream
            DataInputStream dis = new DataInputStream(bis);
            //Method for reading Java Standard data types
            String inputline;
            String line;
            String separator = System.getProperty("line.separator");
            // reads a line till next line operator is found
            int i=0;
            while ((inputline=dis.readLine()) != null) 
            {
                i++;
                a[i]=inputline;
                a[i]=a[i].replaceAll(separator,"");
                //replaces unwanted patterns like /n with space
                a[i]=a[i].trim();
                // trims out if any space is available
                a[i]=a[i]+".txt";
                //takes the file name into an array
                try
                // to handle run time error
                /*take the sequence in to an array*/
                {
                    BufferedReader in = new BufferedReader (new FileReader(a[i]));
                    String inline = null;
                    int j=0;
                    while((inline=in.readLine()) != null)
                    {
                        j++;
                        b[i][j]=inline;
                        Pattern q=Pattern.compile(">");
                        //Compiling the regular expression
                        Matcher n=q.matcher(inline);
                        //creates the matcher for the above pattern
                        if(n.find())
                        {
                            /*appending the comment line*/
                            b[i][j]=b[i][j].replaceAll(">gi","");
                            //identify the pattern and replace it with a space
                            b[i][j]=b[i][j].replaceAll("[a-zA-Z]","");
                            b[i][j]=b[i][j].replaceAll("|","");
                            b[i][j]=b[i][j].replaceAll("\\d{1,15}","");
                            b[i][j]=b[i][j].replaceAll(".","");
                            b[i][j]=b[i][j].replaceAll("_","");
                            b[i][j]=b[i][j].replaceAll("\\(","");
                            b[i][j]=b[i][j].replaceAll("\\)","");
                        }
                        /*printing the sequence in to a text file*/
                        b[i][j]=b[i][j].replaceAll(separator,"");
                        b[i][j]=b[i][j].trim();
                        // trims out if any space is available
                        File create = new File(inputline+"R.txt");
                        try
                        {
                            if(!create.exists())
                            {
                                create.createNewFile();
                                // creates a new file
                            }
                            else
                            {
                                System.out.println("file already exists");
                            }
                        }
                        catch(IOException e)
                        // to catch the exception and print the error if cannot open a file
                        {
                            System.err.println("cannot create a file");
                        }
                        BufferedWriter outt = new BufferedWriter(new FileWriter(inputline+"R.txt", true));
                        outt.write(b[i][j]);
                        // printing the contents to a text file
                        outt.close();
                        // closing the text file
                        System.out.println(b[i][j]);
                    }
                }
                catch(Exception e)
                {
                    System.out.println("cannot open a file");
                }
            }
        }
        catch(Exception ex)
        // catch the exception and prints the error if cannot find file
        {
            System.out.println("cannot find file ");
        }
    }
}

正確に教えていただければ、より理解しやすくなります。

score 0 · Accepted Answer

このコードは、Java の専門知識がないため、価格を獲得できません。たとえば、正しい場合でも OutOfMemory を期待します。一番いいのは書き直しです。それにもかかわらず、私たちは皆、小さく始めました。

ファイルへのフルパスを指定します。また、出力では、ディレクトリがおそらくファイルから欠落しています。
BufferedReader などを使用することをお勧めします。
i を -1 で初期化します。より良い使用for (int i = 0; i < a.length; ++i)。
ループの外でパターンをコンパイルするのが最善です。ただし、Matcher は削除してください。あなたもそうすることができますif (s.contains(">")。. 新しいファイルを作成する必要はありません。

コード：

const String encoding = "Windows-1252"; // Or "UTF-8" or leave away.
File f = new File("C:/input.txt");
BufferedReader dis = new BufferedReader(new InputStreamReader(
    new FileInputStream(f), encoding));

...

        int i= -1; // So i++ starts with 0.
        while ((inputline=dis.readLine()) != null) 
        {
            i++;
            a[i]=inputline.trim();
            //replaces unwanted patterns like /n with space
            // Not needed a[i]=a[i].replaceAll(separator,"");

java - テキスト ファイルを複数のテキスト ファイルに解析する

2 に答える 2

Related

Reference

java - テキストファイルを複数のテキストファイルに解析する