0

以下のコードで表されるいくつかのテキストファイルを処理します。

コード :

$file = file($files);
$lines = str_replace("'", '', $file);
$noMultipleSpace = removeMultipleSpaces($lines);
$fileContents = array();
foreach($noMultipleSpace as $line) {
    if (isLatin($line) && count(preg_split('/\s+/', $line)) > 25) {
        $newContent = preg_split('/\\.\\s*/', $line);
        foreach($newContent as $newsContent) {
            $pos1 = stripos($newsContent, ':');
            if ($pos1 == false && count(preg_split('/\s+/', $newsContent) > 3) && isLatin($newsContent)) {
                $fileContents[] = $newsContent;
            }
        }
        $content = implode('.', $fileContents);
    }
}​

関数付き:

function isLatin($string) {
 return preg_match('/^\\s*[a-z,A-Z]/', $string) > 0;
}

function removeMultipleSpaces($string){
 return preg_replace('/\s+/', ' ',$string);
}

しかし、implodeプロセスでは、次の文にドットが貼り付けられます。たとえばsentence1 .Sentence2。私の期待はsentence1. Sentence2です。どうしたの?ありがとうございました :)

入力はテキストファイルです。例:

ChengXiang Zhai
Department of Computer Science University of Illinois at Urbana Champaign

ABSTRACT
Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text 
information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and
revealing research trends in scientific literature. In this paper, we study a particular TTM 
task ­ discovering and summarizing the evolutionary patterns of themes in a text stream. We
define this new text mining problem and present general probabilistic methods for solving
this problem through (1) discovering latent themes from text; (2) constructing an evolution
graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods
on two different domains (i.e., news articles and literature) shows that the proposed 
methods can discover interesting evolutionary theme patterns effectively. Categories and 
Subject Descriptors: H.3.3 [Information Search and Retrieval]: Clustering General Terms: 
Algorithms Keywords: Temporal text mining, evolutionary theme patterns, theme threads, 
clustering

1.

INTRODUCTION

Temporal Text Mining (TTM)...重要な文章だけを取得したいeffectively

4

1 に答える 1

2

中間の文の末尾にスペースがあるように見えるため、区切り文字が外れて表示されます。

これを試して:

$file = file($files);
$lines = str_replace("'", '', $file);
$noMultipleSpace = removeMultipleSpaces($lines);
$fileContents = array();
foreach($noMultipleSpace as $line) {
    if (isLatin($line) && count(preg_split('/\s+/', $line)) > 25) {
        $newContent = preg_split('/\\.\\s*/', $line);
        foreach($newContent as $newsContent) {
            $pos1 = stripos($newsContent, ':');
            if ($pos1 == false && count(preg_split('/\s+/', $newsContent) > 3) && isLatin($newsContent)) {
                $fileContents[] = $newsContent;
            }
        }
        $fileContents = array_map('trim', $fileContents);
        $content = implode('.', $fileContents);
    }
}​
于 2012-10-08T21:14:03.207 に答える