Daniel Sperl, developer of the Sparrow Framework, recently posted a performance comparison on the Apple Developer forum where Sparrow ran 2.5 times faster with MRC code than the version upgraded to ARC.

A curious finding though it seemed very far off from real world observations. Being a synthetic benchmark no less. I decided to do a similar test based on the same code comparing cocos2d v2 and v3.

Fortunately cocos2d-iphone v3 has made a similar switch from MRC (v2.1 and earlier) to ARC (v3 preview). Unfortunately the internals of cocos2d also changed to some extent, for example custom collection classes written in C were replaced by Core Foundation classes. I don’t have a full overview of the changes, but at least the renderer doesn’t seem to have changed in any significant way. Yet.

So while comparability is good, it’s not like Sparrow where truly the only changes made were converting the code from ARC back to MRC. Take the following benchmark results and comparisons with two grains of salt and pepper on the side.

ARC vs MRC

The original benchmark done with Sparrow has seen MRC perform 2.5 times better than ARC in a synthetic “draw as many sprites as possible until framerate has dropped consistently below 30 fps” test:

sparrow_arc_vs_mrc

Continue reading »

Or so goes the argument. Still.

I wish Apple would just pull the plug and completely remove MRC support from LLVM. I’m getting tired, annoyed and sometimes angry when I browse stackoverflow.com and frequently find MRC code samples containing one or more blatant memory management issues.

Before I rant any further, this article is about testing the performance difference of ARC vs MRC code. I provide some examples, and the updated performance measurement project I’ve used before for cocos2d performance analysis, and the results of the full run at the bottom. I also split it into both synthetic low-level tests and closer to real-world algorithms to prove not one but two points:

ARC is generally faster, and ARC can indeed be slower - but that’s no reason to dismiss it altogether.

Measuring & Comparing Objective-C ARC vs MRC performance

Without further ado, here are the results of the low-level MRC vs ARC performance tests, obtained from an iPod touch 5th generation with compiler optimizations enabled (release build):

Continue reading »

Screen Shot 2013-01-17 at 01.46.23I’m currently working on a new tilemap renderer for KoboldTouch.

I now have an early version that’s fairly complete and does most of what cocos2d’s tilemap renderer can do. Pun intended: yes, cocos2d’s tilemap renderer really doesn’t do all that much: load and display tilemaps with multiple layers.

In fact my current implementation is one step ahead already:

KoboldTouch’s tilemap renderer doesn’t require you to use -hd/-ipad/-ipadhd TMX files and the related (often hard to use or buggy/broken) TMX scaling tools. Just use the same TMX file designed for standard resolution, then simply provide just the tileset images in the various sizes with the corresponding -hd/-ipad/-ipadhd suffixes. The tilemap looks the same on a Retina device, just with more image detail.

Performance Comparison

Anyhow, I thought I’ll do some quick performance tests. I have a test map with 2 layers and a tiny tileset (3 tiles, 40×40 points). I’m comparing both in the same KoboldTouch project, using the slim MVC wrapper (named KTLegacyTilemapViewController) for cocos2d’s tilemap renderer CCTMXTiledMap.

Continue reading »

Cocos2D Webcam Viewer, Part 2: Asynchronous Texture Loading

On February 23, 2012, in idevblogaday, by Steffen Itterheim

I updated the Cocos2D Webcam Viewer project from a previous article to download a file from the web asynchronously, and then load its texture asynchronously as well. You can now switch between the two modes to see how asynchronous operations almost completely removed the pauses the app experiences in synchronous mode. Just tap the screen to switch modes.

To visualize the lag I added a constantly moving sprite at the bottom. This makes the lag easier to spot than a framerate counter. I also removed all error checking code from this article to make the code easier to read. As always you can find the Cocos2D Webcam Viewer source code with full error checking on the LearnCocos2D github repository.

Continue reading »

In Depth iOS & Cocos2D Performance Analysis with Test Project

On November 17, 2011, in idevblogaday, by Steffen Itterheim

I took Mike Ash’s performance measuring code from 2008 with the improvements made by Stuart Carnie in early 2010 and turned that into a performance measuring project for 2012.

I know it’s still 2011, consider this a forward-looking statement. In any case, the test project is available for download, ready to run, includes Cocos2D v1.0.1 and is relatively easy to modify for your own needs. This project is also available on my github repository where I host all of the iDevBlogADay source code.

Since numbers are so dry and hard to assess, you’ll find the rest of this post garnered with charts and conclusions based on the results obtained from iPhone 3G, iPod 4 and iPad.

Continue reading »

Cocos2D Sprite-Batch Performance Test

On September 8, 2011, in cocos2d, idevblogaday, Kobold2D, by Steffen Itterheim

While writing the Learn Cocos2D book I was surprised to find that Cocos2D’s CCSpriteBatchNode was only able to increase the performance of several hundred bullet sprites on screen by about 10-15% (20 to 22.5 fps). I wanted to re-visit that scenario for a long time because as far as I understood, the more sprites I was drawing the greater the impact of CCSpriteBatchNode should be.

But even Cocos2D’s own sprite performance tests (compare columns 9 and 10) revealed a performance difference of under 20% (39 to 42 fps). It’s only when all sprites are scaled and rotated, or most of them are outside the screen area, that sprite batching seems to have a bigger impact (25 to 60 fps). Surely that scenario is not applicable to most games. So I started investigating.

Continue reading »

This Kobold2D FAQ article explains the difference between Corona SDK and iPhone Wax library, and evaluates the existing and future options for Lua scripting in Kobold2D.

Continue reading »

iPhone Performance Killers

On March 8, 2011, in Programming, by Steffen Itterheim

Have a look at the following code, and then answer these questions before reading on:

  1. Which function will run faster?
  2. What will be the framerate for each function when run 100 times per frame on an iPhone 3G?
  3. Will wrapping the 100 calls to function1 in an NSAutoreleasePool show any difference?

[cc lang=”ObjC” height=”465″]
-(void) function1
{
CGPoint pos = [self position];
id x = [NSNumber numberWithFloat:pos.x];
id y = [NSNumber numberWithFloat:pos.y];
id objects = [NSArray arrayWithObjects:x, y, nil];
id keys = [NSArray arrayWithObjects:@”x”, @”y”, nil];
id dict = [NSDictionary dictionaryWithObjects:objects forKeys:keys];
dict; // avoid compiler warning, is a noop
}

-(void) function2
{
CGPoint pos = [self position];
id x = [[NSNumber alloc] initWithFloat:pos.x];
id y = [[NSNumber alloc] initWithFloat:pos.y];
id objects = [[NSArray alloc] initWithObjects:x, y, nil];
id keys = [[NSArray alloc] initWithObjects:@”x”, @”y”, nil];
id dict = [[NSDictionary alloc] initWithObjects:objects forKeys:keys];
[x release];
[y release];
[objects release];
[keys release];
[dict release];
}
[/cc]

The Answers

  1. Which function will run faster? Answer: function1
  2. What will be the framerate for each function when run 100 times per frame on an iPhone 3G? Answer: 27 fps for function1 and 24 fps for function2.
  3. Will wrapping the 100 calls to function1 in an NSAutoreleasePool show any difference? Answer: no, but memory of temporary objects is released immediately.

Needless to say, on an iPod (4th Generation) and an iPad these tests all run at 60 fps and give no indication whatsoever that the performance on an iPhone 3G would suffer this much (and neither does the Simulator, of course). All the more reason to test early and often on older devices.

To autorelease or not?

Common wisdom may tell you that alloc/release is faster than autorelease. Even Apple recommends avoiding autorelease, right?

Not quite, because this is often misunderstood: Apple recommends to avoid autorelease but only for functions which create a lot of temporary objects and because of the constrained memory - not because it’s slow or even dangerous - autorelease is not dangerous.

Since memory is so constrained on 1st and 2nd generation iOS devices, it’s best to release that memory as soon as possible and don’t leave it allocated for longer than necessary. To achieve this, you can choose to do two things in this case: use alloc/release or enclose the loop in an NSAutoreleasePool. The latter option is preferred since it will release the memory right away, and not some time later. And autorelease is generally preferable because you will never, ever forget to send a release message to an object - which means it’ll be leaked and forever use up memory.

You can write well-performing, even better-performing code by using autorelease and using NSAutoreleasePool around tight loops creating many temporary autorelease objects.

Innocent looking code kills framerate

Did you expect that creating 100 rather simple NSDictionary instances each frame would drag the framerate down to around 24-27 fps? Me neither. I knew the code wasn’t going to be blazing fast, but I never expected it to have such an impact. However, it can be optimized somewhat since I’m unnecessarily creating two NSArray instances to hold the keys and values respectively before using them to create the NSDictionary. In fact we can get rid of them by using dictionaryWithObjectsAndKeys and doing this in a single step:

[cc lang=”ObjC”]
-(void) function1Optimized
{
CGPoint pos = [self position];
id x = [NSNumber numberWithFloat:pos.x];
id y = [NSNumber numberWithFloat:pos.y];
id dict = [NSDictionary dictionaryWithObjectsAndKeys:x, @”x”, y, @”y”, nil];
dict; // avoid compiler warning, is a noop
}
[/cc]

Sometimes it helps to look around what other ways there are to run the same code. In terms of performance this is an order of a magnitude faster and now clocks in at 42 fps. Still not good enough for realtime rendering obviously but an improvement of over 50% by cutting two NSArray allocations is a very simple and effective optimization.

Just as a general guideline, when I get rid of the two NSNumber instances and simply pass empty strings for x and y the framerate went back up to 60 fps. Of course that’s over-optimizing to the point where the code doesn’t work anymore. It just goes to show how expensive the creation of NSDictionary and NSArray are, as is wrapping simple types in NSNumber or NSValue objects.

If you can avoid allocation and temporary objects, avoid it. If you can’t, at least avoid creating temporary objects every frame. Re-use objects as much as possible. Unfortunately, that’s not an option for NSNumber objects since you can’t change the value of a NSNumber instance.

Page 1 of 212