Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
Want to learn more about what’s really happening inside those square brackets? Read the entire Inside the Bracket series.
Last time we took a look at the motivations behind Objective-C message sending. It’s a layer of indirection that lets one chunk of code treat other, perhaps unrelated, chunks of code in a uniform manner. “I have a pile of views here, I shall draw them all with -drawRect:
, and I don’t care if the views are Buttons, Sliders, or World Maps.” There’s a loop that hits a collection and sends the same message to a bunch of objects:
for (NSView *view in visibleviews) {
[view drawRect: view.bounds];
}
Objective-C performs this magic by having a collection associated with each object. This collection has a mapping of names (like drawRect:
) to chunks of code. The mapping of names to code happens at runtime, rather than compile or link time. This deferral comes at a slight performance penalty, but gives us access to interesting under-the-hood data at runtime, and also gives us a lot of flexibility.
What is this extra data? What is this flexibility? Glad you asked.
Each object that’s floating around in memory is described by a class. In Objective-C, a class is an object too – it can receive messages. The class holds all the metadata that describes the instances of that class – things like the set of methods it implements, what protocols it adopts, any @properties
, its instance variables, and so on. The class also has a reference to the one-and-only superclass of the class. Say UIButton
inherits from UIView
. This means that UIButton
’s superclass is UIView
.
An individual object is a dynamically allocated chunk of memory that contains the instance variables of the object, such as the view’s bounds or the background fill color for a layer. The object also has a pointer, called isa
, at a predictable location (the first 4 or 8 bytes of the object) . This points to the class. “isa
” comes from “is-a”. This object in the memory is-a button because its isa pointer points to the UIButton
class. It’s now very easy for you, when given any random Objective-C object, to find out where its class lives. Just look at the first pointer.
Here’s how all the chunks of data relate:
What’s going on in this diagram? A button’s isa
pointer points to the UIButton
class. The class has (amongst other stuff) a reference to its map of methods. It has a reference to its superclass (UIView
) as well. UIButton
implements some button-specific methods like drawRect and style, as well as the mysterious “blah” method. You can see that UIView
implements some generic housekeeping methods, like the frame handling, background color storage, and it has its own version of blah.
So, that method map. It’s a dictionary of names and methods.
The name is called the selector. What’s a selector then? It’s the key into that dictionary. A selector is actually a char *
, so you can print one out in gdb
or lldb
. This is an implementation detail, but makes for a nice debugging feature. The current selector is passed in a hidden argument called _cmd
, so you can “print _cmd
” inside of your debugger. If you’re going to and from string representation of selectors for actual work (say to some selector names in a Cocoa collection), use the NSStringFromSelector
an NSSelectorFromString
functions.
The selector is one side of the method map. The method implementation is on the other side. It’s the address of a function to call. It’s an IMP
, short (and shouty) type that stands for “implementation”. An IMP
is a function pointer which points to a function that takes an id and a selector:
typedef id (*IMP)(id, SEL, ...);
Look familiar? That’s the same as the arguments to objc_msgSend. It’s also the same two hidden parameters to methods: self
and _cmd
.
You can ask an object for that function pointer that’s behind a particular message. It’s a function pointer, so you can jump through it like any other function pointer. Here’s some code that takes a string, gets the function pointer behind -uppercaseString, and then jumps through it directly. (Some code at this gist)
NSString *string = @"Bork";
IMP uppercase = [string methodForSelector: @selector(uppercaseString)];
NSString *upcase = uppercase (string, @selector(flonknozzle));
NSLog (@"%@ -> %@", string, upcase);
And a sample run:
Bork -> BORK
Notice that I passed a nonsense selector as the second argument (_cmd
). That shows that uppercase is just a function pointer and not actually vectoring through objc_msgSend
.
Ordinarily you won’t be grabbing the IMP
and jumping through it because it defeats the whole idea of polymorphism – you’re short-circuiting the method look up process. But if you know that all the objects in a collection are the same, you can get the IMP
and jump directly to the method implementation. Be sure you’ve profiled your app before doing this kind of micro-optimization. It could become the source of bugs if your heterogenous collection starts having different kinds of objects in it. “Why is UIButton's
-drawRect
suddenly trying to draw sliders?”
A “signature” is the term for the types that a method or function takes as parameters and what its return value is. The signature of NSData
’s dataWithContentsOfFile:
is “Takes a string (path) and returns an object (an NSData
of the contents of the file at the path)”.
Method signatures are part of a class’s metadata. You can ask a class for the signature of a method using methodSignatureForSelector
. The signature is encapsulated in a NSMethodSignature
object that you can then poke around. Here’s an NSString
method with some parameters:
- (NSRange) rangeOfCharacterFromSet: (NSCharacterSet *) aSet
options: (NSStringCompareOptions) mask
range: (NSRange) searchRange;
This method takes an object, a bit mask, and an NSRange
struct. It returns an NSRange
. You get the signature by asking the class:
NSMethodSignature *signature =
[NSString instanceMethodSignatureForSelector: @selector(rangeOfCharacterFromSet:options:range:)];
Now poke around the signature:
NSLog (@"%ld arguments", [signature numberOfArguments]);
for (NSUInteger i = 0; i < [signature numberOfArguments]; i++) {
NSLog (@"%ld -> %s", i, [signature getArgumentTypeAtIndex: i]);
}
NSLog (@"returning %s", [signature methodReturnType]);
Running this yields this extremely illuminating output:
5 arguments
0 -> @
1 -> :
2 -> @
3 -> Q
4 -> {_NSRange=QQ}
returning {_NSRange=QQ}
Ummm… Yeah. Moving on then!
These character strings are “type encodings”. They’re character sequences that describe individual types. You can ask the compiler for a type’s encoding string by using the @encode
directive: (this stuff is at this gist)
NSLog (@"int: %s", @encode(int));
NSLog (@"CGRect: %s", @encode(CGRect));
NSLog (@"NSString *: %s", @encode(NSString *));
This prints out:
int: i
CGRect: {CGRect={CGPoint=dd}{CGSize=dd}}
NSString *: @
Lower-case i
for an int. CGRect
is a {struct}
with two structs, each of which has a double
. @
is an object pointer. You can see the list of encodings, or you can also ask Uncle Google for “objective-C runtime programming guide type encodings” for when Apple breaks this documentation link. The particular characters and what they correspond to are an implementation detail, so don’t go hardcoding “{_NSRange=QQ}
”. You can use @encode
to get the proper encoding string in a robust manner.
Here, again, is the signature for rangeOfCharacterFromSet
…, annotated
5 arguments
0 -> @ object pointer
1 -> : selector
2 -> @ object pointer
3 -> Q unsigned long long
4 -> {_NSRange=QQ} NSRange struct, with two unsigned long longs
returning {_NSRange=QQ NSRange struct with two unsigned long longs
And now things should make more sense. The first two arguments are, you guessed it, self
and _cmd
. Then follow the three arguments to the method – an object pointer (to an NSCharacterSet
), a big int
used for a bit mask, and an NSRange
. It returns an NSRange
.
There’s one caveat: the type encodings don’t handle variable argument lists. You can’t tell with NSMethodSignature
if something is a varargs method or not.
Armed with this, you can figure out at run time what the calling convention is for an arbitrary method so long as you know its selector. Sure, that’s interesting trivia, but can it be useful information?
If you know the signature for a method, and savvy with the platform ABI , you can package up message-sends into an object. In essence, creating the potential for a message send. Then, in the future, you can take this package and cause it to actually send a message. You’re freeze-drying a method invocation for later thawing. Apple gives us a class to do this, hiding the grody details : NSInvocation
.
Be warned, NSInvocation
is kind of a pain to deal with. And its performance is terrible. Mike Ash has a test program (that you can run for yourself) which times various common operations. Here’s a subset, with the time for each operation in nanoseconds.
IMP-cached message send 0.7
C++ virtual method call 1.1
Objective-C message send 4.9
NSInvocation message send 77.3
The timing of the first three are unsurprising – an IMP-cached message send is just a function pointer call, so it’s very fast. A C++ virtual method call is a pointer+offset (find the vtable) followed by a pointer + offset (find the function pointer in the vtable) and then a function pointer call. It’s a little more work so takes a bat more time. An Objective-C message send does a fair amount of work as you’ve seen already.
NSInvocation
takes 15 times longer to use an already-existing NSInvocation
object and invoke it than to call a method directly. 77 nanoseconds is actually not a long time, so don’t avoid invocations it if it can lead to elegant designs.
Making an NSInvocation
is a multi-step process. Before showing the invocation, here’s some code that uses rangeOfCharacterInSet:
… that we’ll invocationize.
// (character indexes 11111111112222
// in the string) 012345678901234567890123
NSString *baseString = @"Why hello there, Hoover.";
// (randomRange) |-------------|
NSRange randomRange = (NSRange){5, 15};
NSCharacterSet *set = [NSCharacterSet whitespaceCharacterSet];
NSStringCompareOptions options = NSBackwardsSearch;
NSRange lastSpace =
[baseString rangeOfCharacterFromSet: set
options: options
range: randomRange];
This is saying “given this string, look in the range of [5,20) for the first whitespace character, but start searching from the end”. In other words, what is the last whitespace character in that range? This call returns {16, 1}, which is the space right before Hoover.
First you need a method signature, otherwise how do you know what arguments to send to the IMP
that backs rangeOfCharacterInSet:
… ?
NSMethodSignature *rangeSignature =
[NSString instanceMethodSignatureForSelector:
@selector(rangeOfCharacterFromSet:options:range:)];
Then make an invocation:
NSInvocation *spaceFinder =
[NSInvocation invocationWithMethodSignature: rangeSignature];
Next, tell the invocation to retain any object arguments. Before ARC you would retain any objects that you put into invocations (and remembered to release them when done). We can’t do that with ARC. So tell the invocation to retain its arguments so they don’t disappear:
[spaceFinder retainArguments];
Set the target (self
) and selector (_cmd
):
[spaceFinder setTarget: baseString];
[spaceFinder setSelector: @selector(rangeOfCharacterFromSet:options:range:)];
Then set the three arguments for the method. Start with argument index 2.
[spaceFinder setArgument: &set atIndex: 2]; // target=0, selector=1
[spaceFinder setArgument: &options atIndex: 3];
[spaceFinder setArgument: &randomRange atIndex: 4];
And you’re done! Invoke it to cause the message send to happen.
[spaceFinder invoke];
And then print the return value.
Uh… Where did the return value go? It gets stuffed into the invocation:
NSRange anotherLastSpace;
[spaceFinder getReturnValue: &anotherLastSpace];
This also returns {16,1], so life is good. You can re-use the invocation and point it at different strings by using setTarget:
// 1111111111
// 01234567890123456789
NSString *secondString = @"<a href="http://www.amazon.com/Seem-Be-Verb-Environment-Future/dp/B0006CZBHO">I seem to be a verb!</a>";
// |-------------|
[spaceFinder setTarget: secondString];
[spaceFinder invoke];
NSRange yetAnotherLastSpace;
[spaceFinder getReturnValue: &yetAnotherLastSpace];
This returns a range of {14, 1}, which is the space right before “verb”.
You can reuse an invocation any number of times.
What’s neat about NSInvocation
is it takes message-sends, which are fundamentally verb-like in nature, and converts them into objects, which are fundamentally noun-like in nature. You can put these invocations into collections, where they sit, lurking, until called into action.
NSUndoManager
is fundamentally a couple of NSArrays
filled with NSInvocations
. You can use invocations to make C callback handling easier. Rather than writing a thunk method that casts a context pointer to an object, and then calling a hard-coded method, you could instead use an NSInvocation
as the context pointer, and have a single generic callback. They’re also used under some circumstances when messages are sent to objects, as you’ll see in the next installment. You can also use an invocation as an operation by putting an NSInvocationOperation
onto a NSOperationQueue
.
The second is just an observation. Notice that the setArgument:
methods take addresses of stuff, like the address of the search options mask, or the address of the range to limit the character search in. There’s no sizeof’s anywhere to let NSInvocation
know how many bytes to grab from memory. There’s no need to – all that information is in the NSMethodSignature
!
Here ends the tour of some of the bits of information you can get at run-time given Objective-C’s rich metadata. Next time, a tour of some of the methods you can use to put this metadata to good use._cmd
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...