iPhone Application Development

August 1, 2009

Regular Expression Engine for Objective-C

Filed under: Objective-C — groundhog @ 8:01 pm

Regular expressions are one of the more useful constructs in computing, and there is no Cocoa support for regular expressions in NSString or related objects. So I wrote myself one.

RegexEngine.h

#import <regex.h>
@interface RegexResult : NSObject
{
	NSString *originalRegex;
	NSString *matchedString;
	NSArray *matches;
	NSArray *ranges;
}

@property (nonatomic, retain) NSString *originalRegex;
@property (nonatomic, retain) NSString *matchedString;
@property (nonatomic, retain) NSArray *matches;
@property (nonatomic, retain) NSArray *ranges;
+(RegexResult *)allocWithResults:(NSString *)regex matchedString:(NSString *)matched positionalMatches:(NSArray *)pmatches positionalRanges:(NSArray *)pranges;

@end

@interface RegexEngine : NSObject {

}

+(RegexResult*)match:(NSString *) regex against:(NSString *)candidate options:(int)options error:(NSError**)error;

@end

RegexEngine.m

//
//  RegexEngine.m
//  Manna Duel
//
//  Created by Robert Mark Waugh on 5/31/09.
//  Copyright 2009 __MyCompanyName__. All rights reserved.
//

#import <regex.h>
#import "RegexEngine.h"

@implementation RegexResult

@synthesize originalRegex;
@synthesize matchedString;
@synthesize matches;
@synthesize ranges;

+(RegexResult *)allocWithResults:(NSString *)regex matchedString:(NSString *)matched positionalMatches:(NSArray *)pmatches positionalRanges:(NSArray *)pranges
{
	RegexResult *result = [[RegexResult alloc] init];
	result.originalRegex = [regex copy];
	result.matchedString = [matched copy];
	result.matches = [pmatches copy];
	result.ranges = [pranges copy];

	return result;
}

-(void) dealloc
{
	[super dealloc];

	[originalRegex release];
	[matchedString release];
	[matches release];
	[ranges release];
}

@end

@implementation RegexEngine

+(RegexResult *)match:(NSString *) regex against:(NSString *)candidate options:(int)options error:(NSError**)error
{
	regex_t compiledRegEx;
	int compileResult = regcomp( &compiledRegEx, [regex UTF8String], options | REG_EXTENDED );
	if( compileResult )
	{
		//handle error
		char errorBuffer[ 1024 ];
		regerror(compileResult, &compiledRegEx, errorBuffer, 1024 );
		NSString *errorMessage = [NSString stringWithUTF8String:errorBuffer];
		return nil;
	}

	regmatch_t matches[ compiledRegEx.re_nsub + 1 ]; // storage includes the entire string at 0

	int matchResult = regexec( &compiledRegEx, [candidate UTF8String], compiledRegEx.re_nsub + 1, matches, 0 );

	if( matchResult == 1 )
		return [RegexResult allocWithResults:regex matchedString:@"" positionalMatches:[NSArray arrayWithObjects:nil] positionalRanges:[NSArray arrayWithObjects:nil]];

	if( matchResult )
	{
		char errorBuffer[ 1024 ];
		regerror(matchResult, &compiledRegEx, errorBuffer, 1024 );
		NSString *errorMessage = [NSString stringWithUTF8String:errorBuffer];
		return nil;
	}

	NSMutableArray *matchedStrings = [[NSMutableArray alloc] init];
	NSMutableArray *matchRanges = [[NSMutableArray alloc] init];

	for( int i = 0; i <= compiledRegEx.re_nsub; i++ )
	{
		if( matches[ i ].rm_so == -1 )
		{
			[matchedStrings addObject:@""];
			[matchRanges addObject:[NSValue valueWithRange:NSMakeRange(0, 0)]];
		}

		size_t matchLength = matches[ i ].rm_eo - matches[ i ].rm_so;
		char matchBuffer[ matchLength + 1 ];
		strncpy(matchBuffer, [candidate UTF8String] + matches[ i ].rm_so, matchLength);
		matchBuffer[matchLength ] = '';
		[matchedStrings addObject:[NSString stringWithUTF8String:matchBuffer]];
		[matchRanges addObject:[NSValue valueWithRange:NSMakeRange(matches[ i ].rm_so, matchLength)]];
	}

	return [RegexResult allocWithResults:regex matchedString:candidate positionalMatches:matchedStrings positionalRanges:matchRanges];
}

@end

Here is an example from an application I wrote which does template substitution according to the following syntax:
[First Person Male!First Person Female/Second Person]
If the person using the app is reading their version of the template, they get the First Person male or female depending on their configured gender.
If the template is coming from another person, the reader in the app sees the second person version of the template.

	RegexResult *results = [[RegexEngine match:@"\\(([^\\!]+)\\!([^\\)]+)\\)" against:text options:REG_EXTENDED error:&error] autorelease];
	while( results && [results.matches count] )
	{
		NSString *choice;
		int personIndex = self.firstPerson + 1;
		NSRange substringRange = [(NSValue *)[results.ranges objectAtIndex:0] rangeValue];
		NSString *substring = (NSString *)[results.matches objectAtIndex:personIndex];
		RegexResult *subResults = [[RegexEngine match:@"([^/]+)/(.*)" against:substring options:REG_EXTENDED error:&error] autorelease];
		if( subResults && [subResults.matches count] )
		{
			int genderIndex = self.gender + 1;
			choice = [subResults.matches objectAtIndex:genderIndex];
		}
		else
			choice = substring;

		returnValue = [returnValue stringByReplacingCharactersInRange:substringRange withString:choice];
		results = [[RegexEngine match:@"\\(([^\\!]+)\\!([^\\)]+)\\)" against:returnValue options:REG_EXTENDED error:&error] autorelease];
	}

I plan to improve on this over time as necessary.
Improvements desired:

  • Make the regex engine itself be more object oriented.  Compile regular expressions into regex objects which have match and substitution methods on them.
Advertisement

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.