I have a fasta file. From that file, I need to get the only sequences containing GTACAGTAGG
and CAACGGTTTTGCC
at the end and/or start of the sequence and put them in a new fasta file. So here's an example:
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/2516_3269
***GTACAGTAGG***GTACACACAGAACGCGACAAGGCCAGGCGCTGGAGGAACTCCAGCAGCTAGATGCAAGCGACTA
TCAGAGCGTTGGGTCCAGAACGAAGAACAGTCACTCAAGACTGCTTT***CAACGGTTTTGCC***
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/3312_3597
CGCGGCATCGAATTAATACGACTCACTATAGGTTTTTTTATTGGTATTTTCAGTTAGATTCTTTCTTCTTAGAGGGTACA
GAGAAAGGGAGAAAATAGCTACAGACATGGGAGTGAAAGGTAGGAAGAAGAGCGAAGCAGACATTATTCA
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/3708_4657
***CAACGGTTTTGCC***ACAAGATCAGGAACATAAGTCACCAGACTCAATTCATCCCCATAAGACCTCGGACCTCTCA
ATCCTCGAATTAGGATGTTCTCCCCATGGCGTACGGTCTATCAGTATATAAACCTGACATACTATAAAAAAGTATACCAT
TCTTATCATGTACAGTAGG***GTACAGTAGG***
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/4704_5021
***GTACAGTAGG***GTGGGAGAGATGGCAGAAAGGCAGAAAGGAGAAAGATTCAGGATAACTCTCCTGGAGGGGCGAG
GTGCCATTCCCTGTGGTCACTTATTCTAAAGGCCCCAACCCTTCAAC***CAACGGTTTTGCC***
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/8/4223_4358
AAATATTGGGTCAAAGAACCGTTACTTTTCTTATATATGCGGCGCGAGGTTTTATATACTGATAAGAACCTACGCCATGG
GACATCTAATTCAGAGGGAAGAAGGTCCATGTCTGTTTGGATGAAATTGAGTCTG
(*
added for highlighting)
I need some way to get the only sequences containing GTACAGTAGG and CAACGGTTTTGCC at the end and/or start of the sequences and get them out in a new fasta file. I'm very new to this. I'm not even sure if it can be done. Thanks in advance for any help you can give.