html - データの ms アクセス解析 html ファイル

Question

ディレクトリを解析してすべての html ファイルを検索するために必要なコードは既に作成しています。ただし、必要な情報を得るために各ファイルを解析する必要があります。ストーリーのタイトル、著者、カテゴリ、章数、ソース、各ストーリーの要約を抽出し、データベースの正しいフィールドに追加する必要があります。各 html ファイルは同じ方法でセットアップされます。可能であれば、各ストーリーの単語数も取得したいと思います。単語数は、各章のテキスト領域内のすべての単語の合計になります。以下に、各 html ファイルの記述方法の概要を示します。これを達成するための最良の方法を教えてください。

<html>
<head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
    <meta name="author" content="AUTHOR">
    <title>AUTHOR: TITLE</title>
</head>
<body>
    <br/><br/>
    <div style="text-align:center">
        <h1>TITLE</h1>
    </div>
    <b>Story:</b> TITLE<br>
    <b>Storylink:</b> <a href="URL">URL
    <b>Category:</b> CATEGORY<br>
    <b>Author:</b> AUTHOR<br/>
    <b>Last updated:</b> 10/16/2011<br/>
    <b>Status:</b> STATUS<br/>
    <b>Content:</b> Chapter 1 to 16 of 16 chapters<br/>
    <b>Source:</b> SOURCE<br><br>
    <b>Summary:</b> SUMMARY

    <!--CHAPTERAREA START-->
        <h2 class=chapterffdl>*Chapter 1*: Chapter 1</h2>
        CHAPTER TEXT CHAPTER TEXT CHAPTER TEXT

        <h2 class=chapterffdl>*Chapter 2*: Chapter 2</h2>
        CHAPTER TEXT CHAPTER TEXT CHAPTER TEXT

        ...

    <!--CHAPTERAREA STOP-->

</body>
</html>

score 0 · Accepted Answer

スタート：

Sub ParseHTML()
''Requires that you add a reference to the
''Windows Script Host Object Model
''Use Tools->References

Dim fs As New FileSystemObject
Dim fl As Folder
Dim f As File
Dim ts As TextStream
Dim sList As Variant
Dim rs As Recordset

''s="create table stories (story text,author text,category text,content text,source text,summary text)
''CurrentDb.Execute S

Set rs = CurrentDb.OpenRecordset("stories")

sList = Split("story,author,category,content,source,summary", ",")

Set fl = fs.GetFolder("Z:\docs\")
For Each f In fl.Files
    If f.Type Like "*HTML*" Then
        Debug.Print f.Name
        Set ts = fs.OpenTextFile(f.Path, ForReading)
        Do While Not ts.AtEndOfStream
            a = ts.ReadLine
            For i = 0 To UBound(sList)
                If Left(Trim(a), Len(sList(i)) + 4) = "<b>" & sList(i) & ":"
                    If Trim(a) Like "<b>Story:*" Then
                        rs.AddNew
                    End If
                    rs(sList(i)) = Trim(a)
                    If Trim(a) Like "<b>Summary:*" Then
                        rs.Update
                        Exit Do
                    End If
                End If
            Next
        Loop
    End If
Next

End Sub

html - データの ms アクセス解析 html ファイル

1 に答える 1

Related

Reference