私が必要とするのは、テキストファイル内の文字タグとパラタグのオフセットですwith respect to the START of the FILE.
。単語が「彼」であると仮定すると、単語all the char offset (start,end)
内で何度も出現する単語<p>......</p>
と、「彼」という単語all the offset (start,end)
を<p>..</p>
含む単語を取得する必要があります。
PS: 配列で一致させる単語があります。オフセットをファイルに書き込む必要があります。
検索する必要があるファイルのサンプルは次のとおりです。
<DOC>
<DOCID> NYT_ENG_20070702.0006.LDC2009T13 </DOCID>
<DOCTYPE SOURCE="newswire"> NEWS STORY </DOCTYPE>
<DATETIME> 2007-07-02 </DATETIME>
<BODY>
<HEADLINE>
CENTRIST PLATFORMS OF NEW LATINO POLITICIANS INDICATE NEW ERE
</HEADLINE>
<TEXT>
<P>
LOS ANGELES
</P>
<P>
Almost three decades ago, a politically connected Hollywood
restaurateur and her husband organized a massive rally to show
Latino support for several Latino elected officials who had
become the targets of negative news coverage.
</P>
<P>
Then-Gov. Jerry Brown headlined the dinner of about 7,000 at the
Los Angeles Convention Center on behalf of then-State Education
Secretary Mario Obledo, who was battling allegations of ties to
the Mexican Mafia, and Congressman Ed Roybal, who was the subject
of a corruption scandal.
</P>
<P>
Today, that Hollywood restaurateur continues to lobby for Latino
issues and candidates, but she said she has no misgivings about
the recent negative coverage of Southern California's top three
Latino elected officials: Mayor Antonio Villaraigosa, Sheriff Lee
Baca and City Attorney Rocky Delgadillo.
</P>
<P>
"It's a different era and different times," said Lucy Casado, a
one-time commission appointee in the Brown administration whose
late husband, Frank, was a co-founder of the state's largest
Latino political organization, the Mexican American Political
Association.
</P>
<P>
What is different today, Casado and many other Latino-rights
advocates say, is that Latinos are no longer a minority in many
parts of California and there is growing recognition in the
community that, increasingly, elected Latino officials are like
almost all elected officials -- blessed with the same positives
and cursed with the same shortcomings.
</P>
<P>
"Antonio, Lee and Rocky are all experiencing the same public
scrutiny that any elected official experiences," said Harry
Pachon, executive director of the Tomas Rivera Policy Institute,
a Latino think tank at the University of Southern California.
</P>
<P>
"The controversy and criticism of them is from their actions as
elected officials, not because of their role as ethnics."
</P>
<P>
The three local leaders have grabbed the wrong kind of headlines
during the past few weeks for various foibles and questionable
decisions.
</P>
<P>
Villaraigosa recently separated from his wife; Baca has come
under fire for his handling of celebrity heiress Paris Hilton's
jailing; and Delgadillo has taken it on the chin over a variety
of issues involving his wife.
</P>
<P>
Pachon and others also say the unique dichotomy of Latino
reaction to their political troubles as elected officials -- and
not as ethnic representatives -- also underscores the increasing
sophistication of the growing Latino voter base.
</P>
<P>
Casado points to a famous mural on the Latino Eastside of Los
Angeles as a symbol of that change.
</P>
<P>
Along Olympic Boulevard in Boyle Heights, the mural on a public
housing project wall shows the fierce, hypnotic eyes of a Latino
activist with flowing hair and a Che Guevara look beaming like
headlights through a fog. Its message states simply, "We Are Not
a Minority."
</P>
<P>
Today, however, those symbolic eyes are looking upon a
dramatically changed sociological and political landscape that
bears little resemblance to what the mural's artist saw back in
the 1970s.
</P>
<P>
Call it Latino power. Call it the emergence of Mexican America.
No longer are Latinos talking about attaining power in
California, home to the nation's biggest Latino population. Now,
the conversation focuses on what they should do with it.
</P>
<P>
And the civil rights-obsessed, Latinos-as-victims rhetoric of the
past is yielding to centrist platforms that spotlight education
reform, children's health care and lower taxes.
</P>
<P>
"The Latino agenda," as Villaraigosa puts it, "is the American
agenda."
</P>
<P>
In California, Latinos hold 1,163 elective offices statewide and
account for almost 25percent of the 120 state Senate and Assembly
members, according to the National Association of Latino Elected
Officials.
</P>
<P>
Those figures have more than quadrupled in the past decade, even
though Latinos account for fewer than one in four voters
statewide.
</P>
<P>
Eight of the state's 53 congressional representatives are Latino.
In 29 of those districts, the Latino population is 100,000 or
more, so there is anticipation of even greater Latino
representation, especially since they are projected to account
for 40percent of the state's population by 2015.
</P>
<P>
Eighty-five percent of California's Latinos are Mexican-Americans
or Mexican immigrants.
</P>
<P>
Pachon said the state Governor's Office and the U.S. Senate are
within reach of Latinos in 10 years.
</P>
<P>
"It's truer today than ever," he said, "and there are any number
of candidates who could potentially rise to that. Of course,
there's Antonio and (Assembly Speaker) Fabian (Nuñez), but there
are others coming up as well."
</P>
<P>
"Obviously, we're in a better position today because we have the
electability here and our populations are growing in numbers,"
said Rep. Hilda Solis, who represents the 32nd Congressional
District in the San Gabriel Valley.
</P>
<P>
"I think people are starting to say, 'Wow, we do have that talent
here. We do have the ability to get people elected."'
</P>
<P>
The rise in Latino power in California has been surging since the
early 1990s, Pachon and others say, but only in part because of
the population growth.
</P>
<P>
Historically, Latinos vote in numbers well below their share of
the population, partly because many are either too young to vote,
unregistered or foreign citizens.
</P>
<P>
But as the Latino population increases in the state, so has
voting participation.
</P>
<P>
In 1992, Latinos accounted for about 8percent of Californians
going to the polls; in 2006, the Latino vote hit 14percent,
according to the Public Policy Institute of California.
</P>
<P>
But Latinos also began mobilizing politically as a backlash
against the 1990s administration of former Republican Gov. Pete
Wilson and Proposition187. That Wilson-supported ballot measure
-- which California voters approved in 1994 -- would have
eliminated social benefits for undocumented immigrants.
</P>
<P>
"Republicans were basically sent to Siberia," said Allan
Hoffenblum, GOP strategist and co-editor of the California Target
Book, a nonpartisan publication tracking state and federal races.
</P>
<P>
In 1998, Cruz Bustamante's election as lieutenant governor marked
the first time a Mexican- American had been elected to a
statewide office since the 19th century.
</P>
<P>
"A thoughtfulness has occurred in the California electorate,"
said Bustamante, who left office last year when he lost in the
race for state insurance commissioner. "(Voters) are looking past
the name; they are looking past the facade of the individual,
what the person looks like....
</P>
<P>
"(Latinos) have learned how to become elected in areas that have
extremely small Latino populations. ... You have to be
mainstream. There is a Latino agenda. It's good schools, a decent
job to take care of your family -- it's the same as everybody
else's agenda."
</P>
<P>
Today, Latino power has also changed in tone and tactics, experts
say. As Villaraigosa's election in Los Angeles in 2005 showed,
Latino power is less nationalistic and more often built around
coalitions reflecting the diversity of California and a
willingness to share the power.
</P>
<P>
Part of it is acknowledging the reality of voting numbers that
remain significantly below population numbers. Part of it is that
for all the Latino gains in elected offices, the voting figures
are still a drop in the bucket, experts say.
</P>
<P>
"Here's a sobering statistic," Pachon said. "While Latinos hold
just over 1,000 elective offices statewide, that's out of a total
of almost 19,000 offices statewide. Latinos still hold only about
5percent of all the elective offices in California."
</P>
<P>
Last week's special election in the Long Beach-South Los Angeles
area to fill the 37th Congressional District seat left vacant by
the death of Rep. Juanita Millender-McDonald reflected the Latino
leadership's approach to affecting power in a diverse society.
</P>
<P>
In a traditionally African-American district -- where the black
registered-voter population has declined over the years to almost
a fourth while Latinos now account for a fifth -- African-
American Assemblywoman Laura Richardson won what was tantamount
to election, defeating closest rival state Sen. Jenny Oropeza,
D-Long Beach.
</P>
<P>
Richardson had on her side Nuñez and all the political and
fundraising clout of the Assembly speakership, as well as the
endorsement of the powerful L.A. County Federation of Labor.
</P>
<P>
It is the same message Villaraigosa drove home in his second
mayoral election, when some criticized him for playing down
ethnic politics at the expense of building a broader
constituency.
</P>
<P>
"We want to see more Latinos elected," Villaraigosa said in an
interview during the campaign, "but they have to be people that
want to represent everybody. They have to be people that want to
identify the commonalities that we share.
</P>
<P>
"So to the extent that they are bridge-builders, yes, we want to
get people like that elected."
</P>
</TEXT>
</BODY>
</DOC>
何千ものファイルに対してそのような処理を行う必要があるため、オフセットを取得する最速の方法はありますか?
私は split() と contains() を使用しようとしていますが、非常に面倒な作業です。それを行うための迅速な方法が必要です..
お願い助けて..
ここに示したファイルの拡張子は.sgm