Objective-CでNSStringをトークン化/分割する最良の方法は何ですか?
9 に答える
ここで答えが見つかりました:
NSString *string = @"oop:ack:bork:greeble:ponies";
NSArray *chunks = [string componentsSeparatedByString: @":"];
文字列を分割するだけの場合は、 を使用します-[NSString componentsSeparatedByString:]
。より複雑なトークン化には、NSScanner クラスを使用します。
トークン化のニーズがより複雑な場合は、オープン ソースの Cocoa String トークン化/解析ツールキットをチェックしてください: ParseKit:
区切り文字 (「:」など) を使用して文字列を単純に分割する場合、ParseKit は間違いなくやり過ぎです。しかし、繰り返しますが、複雑なトークン化のニーズに対して、ParseKit は非常に強力で柔軟です。
ParseKit Tokenization documentationも参照してください。
複数の文字でトークン化する場合は、NSString のcomponentsSeparatedByCharactersInSet
. whitespaceCharacterSet
NSCharacterSet には、やなどの便利な既製のセットがいくつかありillegalCharacterSet
ます。また、Unicode 範囲の初期化子があります。
次のように、文字セットを組み合わせてトークン化することもできます。
// Tokenize sSourceEntityName on both whitespace and punctuation.
NSMutableCharacterSet *mcharsetWhitePunc = [[NSCharacterSet whitespaceAndNewlineCharacterSet] mutableCopy];
[mcharsetWhitePunc formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSArray *sarrTokenizedName = [self.sSourceEntityName componentsSeparatedByCharactersInSet:mcharsetWhitePunc];
[mcharsetWhitePunc release];
連続して charSet の複数のメンバーに遭遇すると空白文字列が生成されることに注意してくださいcomponentsSeparatedByCharactersInSet
。そのため、1 未満の長さをテストすることをお勧めします。
文字列 (単語、段落、文字、文、および行) の言語機能を分割する場合は、文字列列挙を使用します。
NSString * string = @" \n word1! word2,%$?'/word3.word4 ";
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByWords
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(@"Substring: '%@'", substring);
}];
// Logs:
// Substring: 'word1'
// Substring: 'word2'
// Substring: 'word3'
// Substring: 'word4'
この API は、スペースが必ずしも区切り文字ではない他の言語 (日本語など) で機能します。またNSStringEnumerationByComposedCharacterSequences
、多くの非西洋文字は 1 バイトを超える長さであるため、 を使用して文字を列挙するのが適切な方法です。
1) トークンをタイプに分類する
2) 新しいトークンを追加する
3) 「{」と「} の間のすべての単語のように、カスタム クロージャ間で文字列を分離する「
そのような要件については、Parse Kitが命の恩人であることがわかりました。
私はそれを使用して、非常に高速で軽量な .PGN (prtable Gaming Notation) ファイルを正常に解析しました。
ldapsearch を使用した LDAP クエリの後で、コンソール出力を分割しなければならない場合がありました。最初に NSTask をセットアップして実行します (ここで良いコード サンプルを見つけました: Execute a terminal command from a Cocoa app )。しかし、Ldap-query-output からプリント サーバー名のみを抽出するために、出力を分割して解析する必要がありました。残念ながら、単純な C 配列操作で C ストリング/配列を操作する場合、それはまったく問題にならないかなり面倒なストリング操作です。これがココアオブジェクトを使用した私のコードです。より良い提案があれば、私に知らせてください。
//as the ldap query has to be done when the user selects one of our Active Directory Domains
//(an according comboBox should be populated with print-server names we discover from AD)
//my code is placed in the onSelectDomain event code
//the following variables are declared in the interface .h file as globals
@protected NSArray* aDomains;//domain combo list array
@protected NSMutableArray* aPrinters;//printer combo list array
@protected NSMutableArray* aPrintServers;//print server combo list array
@protected NSString* sLdapQueryCommand;//for LDAP Queries
@protected NSArray* aLdapQueryArgs;
@protected NSTask* tskLdapTask;
@protected NSPipe* pipeLdapTask;
@protected NSFileHandle* fhLdapTask;
@protected NSMutableData* mdLdapTask;
IBOutlet NSComboBox* comboDomain;
IBOutlet NSComboBox* comboPrinter;
IBOutlet NSComboBox* comboPrintServer;
//end of interface globals
//after collecting the print-server names they are displayed in an according drop-down comboBox
//as soon as the user selects one of the print-servers, we should start a new query to find all the
//print-queues on that server and display them in the comboPrinter drop-down list
//to find the shares/print queues of a windows print-server you need samba and the net -S command like this:
// net -S yourPrintServerName.yourBaseDomain.com -U yourLdapUser%yourLdapUserPassWord -W adm rpc share -l
//which dispalays a long list of the shares
- (IBAction)onSelectDomain:(id)sender
{
static int indexOfLastItem = 0; //unfortunately we need to compare this because we are called also if the selection did not change!
if ([comboDomain indexOfSelectedItem] != indexOfLastItem && ([comboDomain indexOfSelectedItem] != 0))
{
indexOfLastItem = [comboDomain indexOfSelectedItem]; //retain this index for next call
//the print-servers-list has to be loaded on a per univeristy or domain basis from a file dynamically or from AN LDAP-QUERY
//initialize an LDAP-Query-Task or console-command like this one with console output
/*
ldapsearch -LLL -s sub -D "cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com" -h "yourLdapServer.com" -p 3268 -w "yourLdapUserPassWord" -b "dc=yourBaseDomainToSearchIn,dc=com" "(&(objectcategory=computer)(cn=ps*))" "dn"
//our print-server names start with ps* and we want the dn as result, wich comes like this:
dn: CN=PSyourPrintServerName,CN=Computers,DC=yourBaseDomainToSearchIn,DC=com
*/
sLdapQueryCommand = [[NSString alloc] initWithString: @"/usr/bin/ldapsearch"];
if ([[comboDomain stringValue] compare: @"firstDomain"] == NSOrderedSame) {
aLdapQueryArgs = [NSArray arrayWithObjects: @"-LLL",@"-s", @"sub",@"-D", @"cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com",@"-h", @"yourLdapServer.com",@"-p",@"3268",@"-w",@"yourLdapUserPassWord",@"-b",@"dc=yourFirstDomainToSearchIn,dc=com",@"(&(objectcategory=computer)(cn=ps*))",@"dn",nil];
}
else {
aLdapQueryArgs = [NSArray arrayWithObjects: @"-LLL",@"-s", @"sub",@"-D", @"cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com",@"-h", @"yourLdapServer.com",@"-p",@"3268",@"-w",@"yourLdapUserPassWord",@"-b",@"dc=yourSecondDomainToSearchIn,dc=com",@"(&(objectcategory=computer)(cn=ps*))",@"dn",nil];
}
//prepare and execute ldap-query task
tskLdapTask = [[NSTask alloc] init];
pipeLdapTask = [[NSPipe alloc] init];//instead of [NSPipe pipe]
[tskLdapTask setStandardOutput: pipeLdapTask];//hope to get the tasks output in this file/pipe
//The magic line that keeps your log where it belongs, has to do with NSLog (see https://stackoverflow.com/questions/412562/execute-a-terminal-command-from-a-cocoa-app and here http://www.cocoadev.com/index.pl?NSTask )
[tskLdapTask setStandardInput:[NSPipe pipe]];
//fhLdapTask = [[NSFileHandle alloc] init];//would be redundand here, next line seems to do the trick also
fhLdapTask = [pipeLdapTask fileHandleForReading];
mdLdapTask = [NSMutableData dataWithCapacity:512];//prepare capturing the pipe buffer which is flushed on read and can overflow, start with 512 Bytes but it is mutable, so grows dynamically later
[tskLdapTask setLaunchPath: sLdapQueryCommand];
[tskLdapTask setArguments: aLdapQueryArgs];
#ifdef bDoDebug
NSLog (@"sLdapQueryCommand: %@\n", sLdapQueryCommand);
NSLog (@"aLdapQueryArgs: %@\n", aLdapQueryArgs );
NSLog (@"tskLdapTask: %@\n", [tskLdapTask arguments]);
#endif
[tskLdapTask launch];
while ([tskLdapTask isRunning]) {
[mdLdapTask appendData: [fhLdapTask readDataToEndOfFile]];
}
[tskLdapTask waitUntilExit];//might be redundant here.
[mdLdapTask appendData: [fhLdapTask readDataToEndOfFile]];//add another read for safety after process/command stops
NSString* sLdapOutput = [[NSString alloc] initWithData: mdLdapTask encoding: NSUTF8StringEncoding];//convert output to something readable, as NSData and NSMutableData are mere byte buffers
#ifdef bDoDebug
NSLog(@"LdapQueryOutput: %@\n", sLdapOutput);
#endif
//Ok now we have the printservers from Active Directory, lets parse the output and show the list to the user in its combo box
//output is formatted as this, one printserver per line
//dn: CN=PSyourPrintServer,OU=Computers,DC=yourBaseDomainToSearchIn,DC=com
//so we have to search for "dn: CN=" to retrieve each printserver's name
//unfortunately splitting this up will give us a first line containing only "" empty string, which we can replace with the word "choose"
//appearing as first entry in the comboBox
aPrintServers = (NSMutableArray*)[sLdapOutput componentsSeparatedByString:@"dn: CN="];//split output into single lines and store it in the NSMutableArray aPrintServers
#ifdef bDoDebug
NSLog(@"aPrintServers: %@\n", aPrintServers);
#endif
if ([[aPrintServers objectAtIndex: 0 ] compare: @"" options: NSLiteralSearch] == NSOrderedSame){
[aPrintServers replaceObjectAtIndex: 0 withObject: slChoose];//replace with localized string "choose"
#ifdef bDoDebug
NSLog(@"aPrintServers: %@\n", aPrintServers);
#endif
}
//Now comes the tedious part to extract only the print-server-names from the single lines
NSRange r;
NSString* sTemp;
for (int i = 1; i < [aPrintServers count]; i++) {//skip first line with "choose". To get rid of the rest of the line, we must isolate/preserve the print server's name to the delimiting comma and remove all the remaining characters
sTemp = [aPrintServers objectAtIndex: i];
sTemp = [sTemp stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceAndNewlineCharacterSet]];//remove newlines and line feeds
#ifdef bDoDebug
NSLog(@"sTemp: %@\n", sTemp);
#endif
r = [sTemp rangeOfString: @","];//now find first comma to remove the whole rest of the line
//r.length = [sTemp lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
r.length = [sTemp length] - r.location;//calculate number of chars between first comma found and lenght of string
#ifdef bDoDebug
NSLog(@"range: %i, %i\n", r.location, r.length);
#endif
sTemp = [sTemp stringByReplacingCharactersInRange:r withString: @"" ];//remove rest of line
#ifdef bDoDebug
NSLog(@"sTemp after replace: %@\n", sTemp);
#endif
[aPrintServers replaceObjectAtIndex: i withObject: sTemp];//put back string into array for display in comboBox
#ifdef bDoDebug
NSLog(@"aPrintServer: %@\n", [aPrintServers objectAtIndex: i]);
#endif
}
[comboPrintServer removeAllItems];//reset combo box
[comboPrintServer addItemsWithObjectValues:aPrintServers];
[comboPrintServer setNumberOfVisibleItems:aPrintServers.count];
[comboPrintServer selectItemAtIndex:0];
#ifdef bDoDebug
NSLog(@"comboPrintServer reloaded with new values.");
#endif
//release memory we used for LdapTask
[sLdapQueryCommand release];
[aLdapQueryArgs release];
[sLdapOutput release];
[fhLdapTask release];
[pipeLdapTask release];
// [tskLdapTask release];//strangely can not be explicitely released, might be autorelease anyway
// [mdLdapTask release];//strangely can not be explicitely released, might be autorelease anyway
[sTemp release];
}
}