ファイル全体をメモリにロードして正規表現を実行するのではなく、メモリの問題なしに任意のサイズのファイルを処理するより高速なアプローチは、次のようになります。
open System
open System.IO
open System.Text.RegularExpressions
// regex: beginning of line, followed by optional whitespace,
// followed by comment chars.
let reComment = Regex(@"^\s*//", RegexOptions.Compiled)
let stripComments infile outfile =
File.ReadLines infile
|> Seq.filter (reComment.IsMatch >> not)
|> fun lines -> File.WriteAllLines(outfile, lines)
stripComments "input.txt" "output.txt"
The output file must be different from the input file, because we're writing to the output while we're still reading from the input. We use the regex to identify comment lines (with optional leading whitespace), and Seq.filter
to make sure the comment lines don't get sent to the output file.
Because we never hold the entire input or output file in memory, this function will work on any size file, and it's likely faster than the "read entire file, regex everything, write entire file" approach.
Danger Ahead
This code will not strip out comments that appear after some code on the same line. However, a regular expression is not the right tool for that job, unless someone can come up with a regular expression that can tell the following two lines of code apart and avoid breaking the first one when you strip everything that matches the regex from the file:
let request = WebRequest.Create("http://foo.com")
let request = WebRequest.Create(inputUrl) // this used to be hard-coded