EDIT: I've reported this as a bug on Adboe's Jira, go there and vote for it: http://bugs.adobe.com/jira/browse/SDK-29904
EDIT: I've posted a follow up to this post using the non-debug player: http://jackviers.blogspot.com/2011/05/flex-3-vs-flex-4-follow-up.html
A week ago, a friend and respected colleague of mine, Paul Smith IV, mentioned in passing that Flex 4 was slower than Flex 3. He had no quantitative data to back up the claim and a quick perusal of Google didn't reveal anything either. About to embark on a large migration of a Flex 3 codebase at work, I decided that a performance comparison was in order. My findings show that the average Flex 4 component is 163.41% slower, or 531.77 ms slower, than its direct Flex 3 counterpart, and that difference in component performance increases application startup time linearly in statically defined applications and also applies generally logarithmically as components are dynamically added to the application view. Though Flex 4 applications perform slower, they produce .swf files 82.26% smaller than Flex 3 applications. On a 512 kbps line with 10% overhead, this results in 88.89% faster download times with Flex 4 vs. Flex 3. If the average human speed of perception is 16ms, the perceived performance gain using Flex 4 is 87.97% over Flex 3. Because this perceived performance only applies to initial load, the questions arise: Should Flex 4 be used in large, dynamically downloaded applications? Do the benefits of code reuse by developers, easier view management, and separation of concerns inherent in Flex 4's spark component architecture outweigh the performance degradation at runtime as compared to Flex 3?
EDIT: I've posted a follow up to this post using the non-debug player: http://jackviers.blogspot.com/2011/05/flex-3-vs-flex-4-follow-up.html
A week ago, a friend and respected colleague of mine, Paul Smith IV, mentioned in passing that Flex 4 was slower than Flex 3. He had no quantitative data to back up the claim and a quick perusal of Google didn't reveal anything either. About to embark on a large migration of a Flex 3 codebase at work, I decided that a performance comparison was in order. My findings show that the average Flex 4 component is 163.41% slower, or 531.77 ms slower, than its direct Flex 3 counterpart, and that difference in component performance increases application startup time linearly in statically defined applications and also applies generally logarithmically as components are dynamically added to the application view. Though Flex 4 applications perform slower, they produce .swf files 82.26% smaller than Flex 3 applications. On a 512 kbps line with 10% overhead, this results in 88.89% faster download times with Flex 4 vs. Flex 3. If the average human speed of perception is 16ms, the perceived performance gain using Flex 4 is 87.97% over Flex 3. Because this perceived performance only applies to initial load, the questions arise: Should Flex 4 be used in large, dynamically downloaded applications? Do the benefits of code reuse by developers, easier view management, and separation of concerns inherent in Flex 4's spark component architecture outweigh the performance degradation at runtime as compared to Flex 3?
Test Machine Hardware and OS
MACHINE: MacBookPro5.2
PROCESSOR NAME: Intel Core 2 Duo
PROCESSOR SPEED: 3.06 GHz
RAM: 4 GB
OPERATING SYSTEM: Mac OS X 10.6.6 (10J567)
Kernel: Darwin 10.6.0
Browser and Flash Player Version
BROWSER VENDOR: Mozilla
BROWSER VERSION: Firefox 3.6.15
FLASH PLAYER VERSION: 10.2.152.33 Debug
Tests
Flex3PerformanceTest
StaticFlex3PerformanceTest100
StaticFlex3PerformanceTest150
StaticFlex3PerformanceTest200
StaticFlex3PerformanceTest250
StaticFlex3PerformanceTest300
StaticFlex3PerformanceTest600
StaticFlex3PerformanceTest1000
Flex3DynamicPerformanceTest
Test Source Code
The source code for these tests can be retrieved and viewed from GitHub at https://github.com/jackcviers/Flex-3---Flex-4-Performance-Test.
Test SDKs
Flex 3: Flex 3.5
Flex 4: 4.1.0.16076
Test Methodology
According to Adobe's DevNet article "Differences between Flex 3 and 4", the following are the Flex 3 components and their new Flex 4 counterparts, separated by a "/" character (original table):
MACHINE: MacBookPro5.2
PROCESSOR NAME: Intel Core 2 Duo
PROCESSOR SPEED: 3.06 GHz
RAM: 4 GB
OPERATING SYSTEM: Mac OS X 10.6.6 (10J567)
Kernel: Darwin 10.6.0
Browser and Flash Player Version
BROWSER VENDOR: Mozilla
BROWSER VERSION: Firefox 3.6.15
FLASH PLAYER VERSION: 10.2.152.33 Debug
Tests
Flex3PerformanceTest
StaticFlex3PerformanceTest100
StaticFlex3PerformanceTest150
StaticFlex3PerformanceTest200
StaticFlex3PerformanceTest250
StaticFlex3PerformanceTest300
StaticFlex3PerformanceTest600
StaticFlex3PerformanceTest1000
Flex3DynamicPerformanceTest
Flex4PerformanceTest
StaticFlex4PerformanceTest100
StaticFlex4PerformanceTest150
StaticFlex4PerformanceTest200
StaticFlex4PerformanceTest250
StaticFlex4PerformanceTest300
StaticFlex4PerformanceTest600
StaticFlex4PerformanceTest1000
Flex4DynamicPerformanceTest
Test Source Code
The source code for these tests can be retrieved and viewed from GitHub at https://github.com/jackcviers/Flex-3---Flex-4-Performance-Test.
Test SDKs
Flex 3: Flex 3.5
Flex 4: 4.1.0.16076
Test Methodology
According to Adobe's DevNet article "Differences between Flex 3 and 4", the following are the Flex 3 components and their new Flex 4 counterparts, separated by a "/" character (original table):
Fig 1: Flex 3 Components and Flex 4 analogues. |
For the tests to be valid, I trimmed out the components that had Flex 4 spark.primitives.* analogues, because the Flex 4 primitives are not "components" and do not adhere to the Flex Component Lifecycle. I also used mx.controls.Image in both Flex 3 and Flex 4 as there is no analogue in Flex 4 for the behavior of mx.controls.Image present in Flex 3.
I then set up three types of tests: the first statically defined an application and each of the components in Fig. 1 above, using the Flex 3 MX Components to define the Flex 3 version of the test and the Flex 4 Spark Components to define the FLex 4 version of the test, the second using the longest running component from the first test to test if the amount of instances statically defined at author time had any effect on performance, and the third to test if the amount of runtime changed as the longest running component from test 1 was added to the stage dynamically.
Test 1
Two Flex Projects were created in Flash Builder 4, one for Flex 3 and one for Flex 4. The applications' flex lifecycle events (preinitialize, initialize, creationComplete, and applicationComplete) were bound to corresponding handlers that recorded the times for each event in a dynamic AS3 Object instance. The object instance's top level key was "testerApp". The times were stored in a secondary AS3 Object instance nested under the "testerApp" key as "preinitializeTime", "initializeTime", "creationCompleteTime" and "applicationCompleteTime". Each component defined in mxml was bound to the flex lifecycle events and handlers for "preinitialize", "initialize", and "creationComplete". The only difference between the handlers in the test applications was to cast the Flex 4 IVisualComponent instances to UIComponent instances to obtain the components' ids to create keys in the time recording Object instance. At "applicationComplete" the application looped through all the immediate visual children of the applications, tracing the time elapsed for each component between "initialize" and "creationComplete" along with the components' id property.
Test 2
In the same two Flex Projects, I created several more applications to test the longest running components from Test 1 above (in both cases the mx/spark Button component) defined at runtime in non-repeater statically defined component instances numbering 100, 150, 200, 250, 300, 600, and 1000. Each component and the application and measurement was bound to lifecycle events and measured as in test 1, with the results traced as in Test 1.
Test 3
In the same two Flex Projects I created an additional application to test dynamic addition at runtime of the longest running components from Test 1 above (again, mx/spark Button components). In this version of the test I added a timer that added the instances dynamically every 10 ms for 1000 repetitions and measured the results as in the above two tests after the 1000th repetition. The timer was created in the applications' preinitialize event handler and started at application complete.
Expectations and Assumptions
Before running the test, I expected that Flex 4 would be an improvement over Flex 3 in performance. The pattern of Separation of Concerns used in Flex 4 architecture should have and did allow framework developers to reuse more business/behavioral classes to display vastly different look and feels necessary for common control components. The visual characteristics for the skins should have been in smaller and more performant classes. All bindings should have been reduced and the code reuse for the various components should have allowed for more time testing and improving code performance. Additionally, the base classes of both frameworks existed in the previous Flex 3 release and should have been improved by the community over the time period between 3.5 and 4.1's release.
Results
To see the Raw Data, download the enclosed Excel Spreadsheet.
Test 1
Component | Difference | Percentage Increase | |
mxButton/sButton | 592 | 147.26% | |
mxButtonBar/sButtonBar | 600 | 156.66% | |
mxCheckBox/sCheckBox | 602 | 160.53% | |
mxComboBox/sDropDownList | 600 | 163.04% | |
mxHorizontalList/sHList | 601 | 170.25% | |
mxHScrollBar/sHScrollBar | 585 | 167.14% | |
mxHSlider/sHSlider | 591 | 173.31% | |
mxImage/mxImage | 579 | 170.80% | |
mxLinkBar/sLinkBar | 571 | 171.99% | |
mxLinkButton/linkButton | 572 | 174.39% | |
mxList/sList | 571 | 175.69% | |
mxNumericStepper/sNumericStepper | 556 | 171.60% | |
mxRadioButton/sRadioButton | 520 | 163.01% | |
mxTextArea/sTextArea | 513 | 161.32% | |
mxTabBar/sTabBar | 500 | 158.73% | |
mxTextInput/sTextInput | 504 | 162.58% | |
mxTileList/sTileList | 501 | 162.66% | |
mxToggleButtonBar/sToggleButtonBar | 489 | 159.28% | |
mxVScrollBar/sVScrollBar | 493 | 162.71% | |
mxVSlider/sVSlider | 488 | 162.67% | |
mxCanvas/sGroup | 485 | 163.85% | |
mxControlBar/sControlBar | 483 | 163.18% | |
mxHBox/sHGroup | 467 | 158.31% | |
mxVBox/sVGroup | 467 | 158.31% | |
mxPanel/sPanel | 469 | 160.62% | |
mxTile/sTile | 427 | 148.78% | |
Total Time Application | 671 | 143.68% | |
Total Time Components | 13826 | 163.41% | |
Total Component Classes | -9 | -33.33% | |
Lines of Code | 474 | 112.86% | |
Application Size | -453500 | -82.26% | |
Download Time | -8 | -88.89% | |
Perceived Performance | -520.5 | -87.97% | |
Fig 2. Actual Component runtimes from Flex 4(Red) and Flex 3(Blue). mx Precedes component names in Flex 3, s precedes component names in Flex 4. |
What is particularly interesting in both the table and chart above is that Flex 4 is about 1.5 times slower at runtime than Flex 3. However, the file size produced by Flex 4 is 82% smaller despite needing 474 more lines of mxml and ActionScript 3 code to achieve. I believe that the decrease in file size is in direct correlation with the code reuse enabled by the new skinning architecture in Flex 4. As you can see, there were nine less component classes needed to create the analogous test applications.
To calculate perceived performance, I used 16 ms as the time it takes for the human eye to perceive change. Since the download time on anything above 512Kbps based on the size of the produced Flex 4 swf was 0 seconds, I used 512Kbps to estimate download time. The Flex 4 swf downloads 9 seconds faster than the larger Flex 3 swf, an ~89% improvement in download time. This results in a human perceived performance gain in the Flex 4 application, because though it runs slower it downloads much faster and can start earlier after page load than the Flex 3 application. Because the test applications were small (an application with only 26 components, each), and because application complete time was less than the total amount of component rendering time, I decided that further testing was necessary, and decided to test the run time of statically defined Flex 4 applications vs. statically defined Flex 3 applications. To do this test, I selected the poorest performing component from each framework (the Button component) and tested them with 100, 150, 200, 250, 300, 600, and 1,000 buttons on the display list. I wanted to test all the way to 20,000, but the mxmlc build failed with not enough memory at 10,000 buttons in Flex 4.
Test 2
For this test I was only interested in the time it took for the application to run between "preinitialize" and "applicationComplete". The results are below:
Application | Flex3Time | Flex4Time | Difference | Percentage | |
50 | 190 | 437 | 247 | 130.00% | |
100 | 276 | 649 | 373 | 135.14% | |
150 | 324 | 856 | 532 | 164.20% | |
200 | 452 | 1112 | 660 | 146.02% | |
250 | 541 | 1339 | 798 | 147.50% | |
300 | 631 | 1515 | 884 | 140.10% | |
600 | 1244 | 2841 | 1597 | 128.38% | |
1000 | 1885 | 4574 | 2689 | 142.65% |
As you can see, Flex 4 again was the poorer performer. I graphed the Flex 4 and Flex 3 application run times to visualize the difference:
Fig. 3 Flex 4 vs. Flex 3 Application Runtimes with trend lines and projected period results. |
As you can see, Flex 3 and Flex 4 run times increase linearly with the number of components added, and Flex 4's rate of increase is higher. The projected results fit the observed data very well.
I also graphed the difference at each level:
Fig. 4 Difference in Flex 4 runtime vs. Flex 3 runtime by number of statically defined longest running components. |
This difference also progressed in a generally linear manner with the number of components. It appears that the number of components defined and on the display list at author time does effect overall application performance and confirms that Flex 4 architecture does perform worse than Flex 3 when statically defined. However, in a small application like that in Test 1 the smaller file size will still result in a faster human perceived performance.
This led me to wonder what would happen in a large, dynamic application. Would component run time increase linearly throughout the life of an application where components were added dynamically? Would dynamic addition of components occur more quickly in Flex 4 vs. Flex 3? Over time, in a large application, would runtime performance improve as a result of additional engineering for larger applications built into Flex 4's new architecture? So I wrote test number three, which builds off of tests 1 and 2. This time, I was interested in the component runtimes of dynamically added buttons, added one at a time to the display list at 10 millisecond intervals until the timer had run 1000 times. Then I would measure the component runtimes and graph the results.
Test 3
The results of Test 3 were much more complex than those of Tests 1 and 2. As you might expect, after the application complete event, adding components is much more expensive for a period of time, then it levels off and begins a linear performance. The result table is long, but Flex 3 performed better again. What is most interesting is the visualization of the data:
Fig 4. Flex 4 vs. Flex 3 Performance with dynamically added components. |
This chart is very complex, because the data returned from this test was very complex. The Flex 4 results perform in a generally Logarithmic manner: very high component run times until more than 60 components are added, then a steep linear trend from components 48 - 450, then a more gradual trend above 450 components. Flex 3 also is generally logarithmic and follows a similar pattern, but its trend lines start at a lower number of milliseconds than Flex 4's, and finish lower than Flex 4's: indicative of better performance in this test.
Conclusions and Additional Points of Discussion
From the above three tests, Flex 4 always performs more poorly than Flex 3 at run time. Each of the spark components performed more poorly than the mx counterpart and the processing of the display list appears to have regressed from Flex 3's performance in both static and dynamic component addition. According to test 1, the Flex 4 components perform at an average of 531 milliseconds or 163.41% worse than their Flex 3 counterparts. Flex 4 produces a much smaller swf executable than Flex 3, thus improving small application's perceived performance time, but ultimately in large, dynamic applications the improvement in download time will only be noticeable for the initially statically defined views and module download times. Any components dynamically added up to the 60th component will probably cause performance issues and a jittery UI. Although Flex 3 has a bigger initial hit, it performs better throughout the lifetime of the application.
Flex 4 has better look and feel extensibility of its built-in components, meaning that you won't be spending as much time in development extending and sometimes duplicating base flex components from a business/behavior logic standpoint, but this extensibility comes at a performance cost over 1.5 times worse on average per component to the end-user.
When we write software, we shouldn't be as concerned with how HARD that software is to write, maintain, and extend as we are about end-user experience. The web has taught us that performance should be our number one concern when writing web applications, and thus Flex 4, while easier to develop and maintain than Flex 3, and thus less likely to be broken by a poorly skilled developer, should probably be avoided until Adobe and the Flex community can bring its performance on par with Flex 3. There is no reason that the overhead of separating view from behavior should have decreased component performance by nearly double when the base codebase of both architectures should have improved from Flex 3 to Flex 4. The community of Flex Framework developers should probably be looking to improve performance in the Flex 4.5 release, rather than adding more components to the spark architecture.
What do you think? Feel free to download the source code from Github and the spreadsheets as well. Comment back. I'd love to see your feedback.
Interesting results.
ReplyDeleteI'd be interested in seeing these tests run without the debug player.
TJ,
ReplyDeleteYou should be able to fire up Service Capture and test the release builds available from the Github repository and see the results in the non-debug player.
I will run those tests and post a follow-up to see if that effects the results. Thanks for the suggestion.
Tj, I ran the test in the non-debug player and results didn't change much:
ReplyDeleteAverage Component Time Degradation in Flex 4 vs. Flex 3 125.51% 427.4230769
In other words, still too much slower.
You use RSL only in the Flex 4 project that's why there is such a difference in file size...
ReplyDeletePeZ,
ReplyDeleteYeah, I left the configurations what they were out of the box from Flash Builder. Do you have any idea if not using the RSL would improve the performance of the spark components? My gut feeling says no...
PeZ,
ReplyDeleteStatically linking the RSLs made it worse by 1% on average.
Hi Jack, this is a really interesting analysis. Have you considered looking at memory consumption after running your tests to make any comparisons between Flex 3 and 4 in that area?
ReplyDeleteThanks a lot for this profound tests. I also tested just quickly and was disappointed about the performance decrease. I guess the problem comes from the fact the in the new architecture there are 2 times the UIComponent used, and unfortunately this overall baseclass ist one of the most heaviest classes with 600ß lines of code (without comments). Adobe shoulde really take more care of performance!
ReplyDelete@Cliff
ReplyDeleteYeah, I have. Haven't had time lately since I have been on a marathon session for the last few weeks. It was good to get some downtime today! However, the code is available on GitHub, so if you want to add in some memory profiling and commit back, or wish to post your own results I'd be happy to review and post a link here and possibly add the results to the Jira ticket for Adobe.
@manfred:
Yes, UIComponent is awfully heavy. However, I actually think the problem lies further up the chain in DisplayObject and IUIComponent, and the use of Arrays and ArrayCollection, and the absence of a VectorCollection. Array iteration is a very expensive operation, and when you know that all items in a particular collection are of a certain base type, using Vector would help speed up the iteration enormously.
In addition, the potential use of metadata tags to link components at runtime could pose an issue, since in order to read the metadata I imagine that they are using describeType, which is another very expensive operation.
Simply doubling the number of components to lay out at runtime probably isn't really the issue. I would presume that the skin ui component is created as a child visual element of the parent component in the generated Actionscript code, along with an added event handler that simply calls partAdded and partRemoved.
I would also like to point out that many mx containers use a similar scheme when using IItemRenderers, and the lists also doubled in execution time from Flex 3 to Flex 4.
No, I suspect that the compiler code was rushed to production, and doesn't optimize the displayList when converting it to ABC.
Haven't done a reverse-engineering study of the ABC produced by the two compilers, but I have a test showing that mx components rendered in Flex 4 in an mx Application also take longer than the same application compiled with Flex 3.
I ran out of time to run the comparison fully before I had to switch gears to do some production work, but when I get a chance those apps will be added to the Git repo and the results published as well as a follow-up to this post. That will not be today as I am fried from the afore-mentioned marathon session I just went through.
@ALL: The issue in Jira has been replicated by Adobe QA, and has been passed to the Adobe Internal Review Board. So it looks like things are moving on this.
For all the .Netheads out there, a comparison between Silverlight and Flex 3 showed that Flex 3 outperformed it on all the tests, as did Flex 4. Silverlight produces a much smaller application file...but has several disadvantages outside of pure performance, including the inability to create custom "routed" events: events equivalent to events that bubble in AS3.
I'm also interested in looking into several "alternatives" or patches that may help fix the problem, including separating spark and mx components from the common UIComponent base class and implementing a separate SparkUIComponent that implements the necessary interfaces and wraps several of the displayList traversal methods in order to optimize their performance BEFORE bytecode conversion. This may take some time, and I'll let everyone know if I produce a working patch.
If anyone else has a workaround for this issue, please, please, please post it and submit it on the JIRA ticket for Adobe and the community to review and possibly apply to a new release build. I think we can all agree that the separation of concerns in spark is a much better solution than the typical "copy and paste" to override solution needed to skin many mx components (ever tried to make a slider with additional elements, stylized ticks (why oh why can't I have minor ticks on a slider?) or a gradient background track with more than one color or a custom background ratio–I know your pain).
Per your comment, "...I have a test showing that mx components rendered in Flex 4 in an mx Application also take longer than the same application compiled with Flex 3." I've modified some of your tests to use mx components within the Flex 4.5 SDK and found your suspicion to be correct - to the tune of another 15 - 20%. So, if anyone is thinking that they can fallback to mx for a little more speed when you are using Flex 4.x... you'll probably need to re-think that. It's things like this that make minimalcomps really appealing.
ReplyDelete