parsing - html 解析 - 改行文字の置換

翻译自：https://stackoverflow.com/questions/10690319 2012-05-21T18:08:58.083

257 次

特定の Xpath からテキストコンテンツを取得する単純な html 解析コードを作成しました。

私のコード:

XPathFactory xFactory = XPathFactory.newInstance();
CleanerProperties props  = new CleanerProperties();
props.setNamespacesAware(false);    
XPath xpathi = xFactory.newXPath();
HtmlCleaner cleaner = new HtmlCleaner(props);
TagNode node = cleaner.clean(rawContent);
org.w3c.dom.Document doc = new DomSerializer(props).createDOM(node);
Object[] obj = xpathi.compile("//div[@class='answer']").evaluate(doc, XPathConstants.NODESET);

これで、予想される答えが入力されたオブジェクトを取得しています。ただし、回答の \n 文字は空の文字列に置き換えられます。例) 答えが 1 2 3 の場合

1 2 3 を取得しています 1 2 3 を取得したい

このために、CleanerProperties でプロパティを設定する必要がありますか?

任意の提案plz..

parsing - html 解析 - 改行文字の置換

0 に答える 0

Related

Reference