Understanding 64-bit in Snow Leopard

September 24th, 2009 by ken

We’ve all seen dozens of news reports and blog entries (complete with angsty blog comments) about Snow Leopard and its new 64-bit support. Two things become clear: 1) People desperately want 64-bit. 2) People have absolutely no idea what it means.

In defense of “people”, it’s not an easy concept for somebody without a computer science background to understand. Apple understands this, and thus doesn’t do tons of marketing around the 64-bit move. They wisely didn’t call it Leopard x64. Their 64-bit marketing webpage makes only a few understated claims about the benefits. According to Apple, it “boosts overall performance”. This is backed up by a graph indicating that 64-bit, along with other improvements in Snow Leopard, give on average a 1.3x speed-up in “common operations”. They also tout 2 other benefits of 64-bit: the ability to address more RAM, and better security.

If you’re curious as to what it really means to be 64-bit and where the improvements come from, then please continue reading. I’m going to do my best to explain what all these bits are, and how it applies to a CPU, an application, and a kernel. I’ll also try to explain what part of this is new in Snow Leopard. To start, we’re going to go all the way back to the beginning. Way back.

flipperStart with a simple thought experiment: Think about a flipper scoreboard like the one pictured here. This board allows 2 digits to represent a score, and each digit can show 10 possible numerals (0 through 9). The right digit is the ones column, and left digit is the tens column. When we count, we flip the ones column, 1, 2, 3, 4, 5, 6, 7, 8 until we run out of digits at 9. Then, we flip it back to zero and flip the tens column to get 10, and the pattern repeats. Simple, right? What’s the highest number that this 2 digit scoreboard can show? The range of the scoreboard is 00 to 99.

baseten

If we want to count higher than 99, we’d need a third digit (hundreds column) at which point we’d be able to count from 000 to 999.

When we count using digits, each new column added increases the range by a factor of 10.

But computers don’t use digits, they use bits. Instead of 10 digits, there are only 2 binary bits to work with: 0 and 1. However, the rules of the game stay exactly the same. Counting in binary on a hypothetical 2 bit scoreboard goes 0, 1, and now we’ve already exhausted the ones column.

binary

So, as before, we roll it back to zero and increment the next column (the 2’s column). What’s the highest number this 2 bit scoreboard can show? The range of the scoreboard is binary 00 to binary 11. If we want to represent a number higher than 3, we need to add another column.

When we count using bits, each new column added increases the range by a factor of 2.

When you hear 16-bit, 32-bit or 64-bit, it simply refers to how many columns of bits are available to represent a number. As it turns out, with 32 columns of binary, the highest number you can represent is 4,294,967,295 (about 4.2 billion). With 64 bits, you can count considerably higher, up to 18,446,744,073,709,551,615 (about 18.4 quintillion.)

64-bit Software

While both limits might seem high, it’s much easier to come up with examples where software runs out of space in a 32-bit number. I’ll give three examples.

Computers keep track of file sizes by counting bytes. If your files grows over 4 gigabytes, the number of bytes will no longer fit in a 32-bit number. If you’ve been using computers long enough, you’ll be familiar with certain programs failing to work correctly with files larger than 4 GB.

dateMost modern computers keep track of the time by counting seconds since a fixed point in the past. 32-bit OS X counts seconds since January 1, 1970 using a signed 32-bit number. (Simply put, the first bit is used to indicate negative or positive, and earlier dates are represented as a negative number indicating seconds before 1970). This leaves it with a range
of -2,147,483,648 to 2,147,483,647. As of this writing, the seconds count is up to 1,252,260,035 and will run out of bits (overflow) some time in the year 2038. Snow Leopard stores this value using 64-bits, safely postponing the inevitable overflow until the year 292277026596.

Digital audio works by recording thousands of audio waveform values (samples) per second. “CD quality” (48 kHz) digital audio records 48000 samples every second. If audio software is counting samples in a 32-bit number, this allows for about 24.8 hours of recording before overflow occurs. Chances are you don’t often work with 25 hour audio files, but if you did, you’d probably notice strange bugs in your favorite audio editing software.

Now, it’s worth noting that 32-bit systems actually can handle numbers larger than 4.2 billion. Clearly 32-bit computers can support files larger than 4 GB. The problems come in when programmers fail to realize the potential of values to get big enough to overflow. On a 32-bit system, programmers can explicitly request that the computer treat certain values as 64-bit. In that case, the 32-bit system simulates 64-bit computations, and everybody’s happy. However, it’s less efficient because simulating a single 64-bit operation requires many 32-bit operations, and programmers don’t always remember to explicitly ask for 64-bit numbers where necessary. For 64-bit OS X, Apple has chosen to adopt the LP64 data model, causing any “long” variables in pre-existing code to automatically be promoted to 64-bits, without requiring the code to be modified. The ability to do native 64-bit operations will increase performance in programs which make heavy use of large numbers such as file offsets and sample counts.

64-bit CPU Architecture:

When we talk about a 64-bit CPU, there’s more to it than just the ability to count to higher numbers. Intel CPUs, when run in 64-bit mode, have a number of fundamental design changes which improve performance.

Think of a “register” as one of many scoreboards within the CPU. Computer programs use registers to store values during computations.

A 32-bit Intel CPU has 8 general purpose registers, each with 32-bits of capacity. The Intel 64-bit architecture expands these registers to 64-bits, but also, and probably more importantly, the new architecture doubles the amount of general purpose registers to 16. A good analogy for this is that it’s like being able to hold more numbers in your head at once while doing mental math. Since computations can be completed in less steps, programs run faster and this provides a lot of the performance boost in 64-bit code. Given Apple’s recent switch from PowerPC to Intel, it’s interesting to note that PowerPC CPUs have 32 general purpose registers.

There is another architecture change involving subroutine calling conventions. Sparing you a several day lecture on assembly language, suffice it to say that the new method is more efficient than the method used in 32-bit Intel CPUs. The new faster calling convention is not exactly like, but much more like PowerPC calling conventions.

One additional benefit of 64-bit architecture is a security feature known as the “NX bit”. It’s difficult to explain, but I will simply say that it gives a level of protection against running rogue code by requiring code in memory to be marked as such.

Memory:

Before going on to some of the most talked-about advantages of 64-bit, we need a quick primer on computer memory.

Before it can be used, all of the code and data a CPU needs has to be stored in RAM, also known as memory. If you were a Mac OS 9 user back in the day, then you might remember opening too many applications and seeing a message that you were out of memory. Since programs could not share memory, the solution was to quit some of your applications, or add more RAM chips. You don’t see that anymore nowadays, because computers use a system called virtual memory. With virtual memory on OS X, each program seems to have its own 4 gigabytes of memory. To make this work, the OS and CPU work together to automatically share physical RAM with each application, temporarily storing the contents of memory to your hard disk when it’s not being used and can’t fit into actual RAM. If too many programs are using too much memory, you’ll notice decreased performance as virtual memory uses the hard disk to store more and more memory. Adding more RAM will decrease the need for virtual memory, thus decreasing the need for hard disk access.

Virtual Address Space:

For the purpose of understanding memory address space, think of RAM as a street with houses, where each house has a unique street number address. As it turns out, many of the numbers a CPU deals with are addresses, allowing it to keep track of where information is living in RAM. Again, each house (or byte of RAM) has to have a unique address. For reasons discussed earlier, in a 32-bit CPU, there can only be 4,294,967,295 possible addresses.

This limits a 32-bit computer program to using 4 GB (4.2 billion bytes) of virtual memory. An individual program can’t use more than 4 GB, because the CPU can’t fit the address into a register. To use the street metaphor, imagine that you’re trying to send a FedEx shipment, but the form you’re filling out only allows for writing a 4 digit street number. There’s simply no way to send a package to #10000! The solution is to add more digits, or in the case of RAM, more bits.

On OS X, each 64-bit process gets its own 256 TB (262,144 GB) of virtual address space. It’s interesting to note that this is only 48-bits of address space and not the full 64. 256 TB is already so insanely huge, that providing the full 64-bits of address space (allowing 16 million gigabytes of virtual memory) seemed like overkill, a little bit of a waste of silicon to say the least.

In 64-bit OS X, virtual memory is limited only by your hard disk’s ability to store the swap file. In theory, there should be a performance boost as the larger virtual address space makes accessing large files more efficient.

Physical Address Space:

If you’re a real power user, you may be scratching your head and saying “I’ve had 16 GB of RAM in my Mac Pro for a while now! What do you mean I’m limited to 4 GB?” Simply put, adding physical RAM above 4GB to a 32-bit system is possible in some machines due to a feature called PAE. It’s still true that each individual 32-bit program can only address 4GB, but adding additional RAM reduces the amount of expensive disk usage that occurs when more than one application is using large amounts of RAM. In other words, each of your running programs can still only use up to 4GB of virtual memory, but they don’t have to share the physical RAM with each other.

PAE adds support for 4 more bits of address space. Since each extra bit doubles the range, PAE allows for up to 64 GB of physical RAM to be addressed on 32-bit machines. Theoretically, the move to 64-bit apps allows for 4 PB (4.5 million GB) of physical RAM. Since that’s just completely absurdly big, current CPU architectures only allow for 1 TB (1000 GB) of physical RAM (8 extra bits.)

So what part of this 64-bit nonsense is actually new in Snow Leopard?

While OS X has been capable of running 64-bit processes and applications for many years, it wasn’t often that you’d come across one. For Snow Leopard, Apple has ported nearly all of the built-in applications and system processes to be 64-bit. This is no small feat, as porting to a different architecture is always fraught with peril, revealing hidden programming errors (where programmers made assumptions; in this case, about the number of bits in a value), and forcing the rewrite and modernizing of old parts of the code (Apple took the opportunity to remove certain ancient technologies from the system.)

Existing 32-bit applications from developers will still run normally. One notable hiccup is with code plug-ins: enhancements, QuickTime components, Audio Units, contextual menus plug-ins, and other situations where an application loads other code. Plug-ins created by 3rd parties are most likely 32-bit-only, and will not work with Apple’s 64-bit applications until developers can ready a 64-bit capable version.

64-bit Kernel:

A kernel is the central process of an operating system. It manages resources like memory and hardware, putting a safe layer between applications and the nasty stuff.

By default, the kernel in Snow Leopard remains a 32-bit process. A full 64-bit kernel is available for the brave, and is probably coming for everyone in the future. Apple is waiting on the porting of all 3rd party extensions that users rely on to interface with their hardware.

A 64-bit kernel process would have the same advantages described so far: a performance boost due to architectural changes, and the ability to address nearly limitless amounts of RAM. Another major advantage of a 64-bit kernel has to do with virtual memory. With a 64-bit kernel, applications and the kernel do not have to share the same virtual address space. In fact, the OS automatically keeps the kernel and applications in different parts of the virtual address space, preventing overlap of addresses. This way, memory addresses don’t have to be translated, which speeds up “system calls”, which is when applications talk with the kernel.

Transitions:

OS X 10.6 is one step in the gradual march to 64-bit that started in OS X 10.4. The final steps will be the kernel, and 3rd party applications getting on the 64-bit bandwagon.

This marks Apple’s 3rd major architecture change in Mac OS, and they’ve shown again that they can do it gracefully, and with little negative impact on the end user. Whether or not the minor performance improvement is worth the minor hassle, 64-bit is the inevitable future of computing, so it’s important that Apple is making the switch now, and doing it well.


One Response to “Understanding 64-bit in Snow Leopard”

  1. mike Says:

    I’ve heard there are some wildlife groups trying to get Apple to do more stuff with the actual S.L.’s lol. I don’t know- people are saying it’s good PR for Apple- they should jump on that.

Leave a Reply