What Is Hex Editing?
To "hex edit" means to make changes to the binary data -- 1's and 0's -- that make up a computer file at the fundamental level. "Hex" is short for "hexadecimal," something that we're going to get to shortly.
Step 1: Learn to Count in Binary
There are a couple of things you need to master before you should even bother downloading a hex editor. The first of these is counting in binary.
Reflect for a moment on our ordinary way of counting and writing numbers, the decimal system. We have ten symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. We use the concept of "digits" to express numbers large than our largest symbol, 9. For instance, to express thirteen, we use a 1 in the "tens digit" and a three in the "ones digit." Similarly, we have a "hundreds digit" to represent tens of tens, and so on out to infinity.
The binary system is the same as the decimal system, except that we only have two symbols: 0 and 1. This means that we need to utilize higher digits at a much faster rate because our largest symbol is 1. So, instead of needing a new digit every time we hit a power of ten, in binary we need to add a new digit every time we hit a power of two. So we have a "ones digit," a "twos digit," "a fours digit," an "eights digit" and so on. To use the same example as above, if we want to express thirteen, we need 1 in the "eights digit," 1 in the "fours digit," and 1 in the "ones digit," or 1101.
Now, here's some examples. Keep working at these until you "get" binary. You will be wasting your time if you try to go forward before you understand this. Seriously.
zero = 0
one = 1
two = 10
three = 11
four = 100
five = 101 (highlight the black box to see the answer)
six = 110
twelve = 1100
fifteen = 1111
one hundred thirty-eight = 10001010
1 + 1 = 10
10 + 1 = 11
11 + 1 = 100
110 + 10 = 1000
1001 + 110 = 1111
1001 * 10 = 10010
Step 2: Learn to Count in Hexadecimal
Where binary is a number system that reduced our normal ten symbols down to just two, hexadecimal is a number system that increases the number of symbols up to sixteen: 0, 1, 2, 3, 4, 5, 6 ,7, 8, 9, A, B, C, D, E, and F. This means we utilize higher digits at a slower rate, since they are powers of sixteen.
Customarily, the prefix "0x" is used to designate a hexadecimal value.
Again, here's some examples. Again, keep working on them until you "get" hexadecimal, since there's no point in going on without understnading this.
one = 0x1
two = 0x2
nine = 0x9
ten = 0xA
eleven = 0xB
fourteen = 0xE (highlight the black box to see the answer)
fifteen = 0xF
sixteen = 0x10
nineteen = 0x13
one hundred thirty-eight = 0x8A
0x1 + 0x1 = 0x2
0x2 + 0x2 = 0x4
0x9 + 0x1 = 0xA
0x8 + 0x5 = 0xD
0xF + 0x5 = 0x14
0xA * 0x2 = 0x14
0xB * 0x3 = 0x21
AFTER you've mastered the fundamentals of counting/reading in hexadeciaml, it's permissible to use a converter to help you deal with larger numbers that are too big to convert quickly in your head.
You may be asking yourself, "Why should I care about the hexadecimal number system if everything on the computer is stored in binary?" The answer is that there is a really, really simple way of converting between binary and hexadecimal that it makes things much easier for humans to read. Because sixteen is two to the fourth power, four binary digits fit cleanly and evenly into one hexadecimal digit, like so:
(binary) 0000 = 0x0
(binary) 0001 = 0x1
(binary) 0010 = 0x2
(binary) 0011 = 0x3
(binary) 0100 = 0x4
(binary) 0101 = 0x5
(binary) 0110 = 0x6
(binary) 0111 = 0x7
(binary) 1000 = 0x8
(binary) 1001 = 0x9
(binary) 1010 = 0xA
(binary) 1011 = 0xB
(binary) 1100 = 0xC
(binary) 1101 = 0xD
(binary) 1110 = 0xE
(binary) 1111 = 0xF
Now about those hex editors: The basic functionality of a hex editor program is to display and make editable the file's binary data in hexadecimal format.
Step 3: Download a Hex Editor
Now that you know how to count/read in binary and hexadecimal and understand what a hex editor does, it's time to go get one. There're tons and tons of hex editor programs out there, many of them are free, and they all offer more or less the same features. Right now, I'm using HxD which is free and perfectly adequate. Feel free to download and use that, or pick any other hex editor of your choice.
Step 3.5: Torchlight 2 Save Files Are Scrambled
Right now, you know enough to open up something simple like a .txt file in your hex editor and probably figure out how to do what you want to do. After the next section, you'd probably be able to at least begin working on something like a save state from a SNES emulator. Unfortunately, Torchlight 2 save files are a little bit tricker because Runic scrambled them to make them a little harder to edit, and added a checksum because steam cloud save was corrupting files, which caused Torchlight 2 to crash. The exact details of the scrambling and checksum are covered in the file spec. For right now, what you need to know is that you need to use the decrypter/encrypter to descramble the save files into hex-editable form. Then you must use it again after editing to get the file back into the form Torchlight 2 will recognize.
Step 4: Understanding What You're Looking At
Strictly speaking, there are no rules about how a given unit of information should be represented in binary. There are, however, conventions, which are usually followed. Additionally, if you use any programming language other than assembly language, by default your program is going to use whatever data representation scheme your compiler uses. And compiler writers follow conventions. So, since C/C++ is the most popular language for serious software, it's usually a pretty safe bet that things are being written out in typical C/C++ data structures. And, indeed, that is the case with Torchlight 2 save files.
VERY IMPORTANT NOTE: Before taking a look at some of the C/C++ data structures, there's one really important thing to learn: Windows computers are LITTLE ENDIAN. For reasons having to do with making hardware more efficient, the folks at Intel decided that it was best to have numbers stored with their least-significant byte (a byte is 8 bits; a bit is a single 1 or 0) first. This leads to the crazy, crazy result that things will look like this in your hex editor: Data structures are ordered left-to-right within the file; bytes within a data structure are ordered right-to-left; and then bits within a byte are ordered left-to-right. So, on the byte level, and the byte level only, you need to read everything backwards. Yes, that's positively insane, but that's what it is.
Example: two hundred fifty-eight = 0x02 01 00 00 (little endian).
If you didn't get that, go back and read it again until it doesn't confuse you.
Common C/C++ Data Structures (Windows 32 bit programs):
The Unsigned Int
This is probably the most common data structure. It's 4 bytes long. Possible values are non-negative integers, zero through 4294967295.
The Signed Int
What happens when you need to have a negative integer? You use a signed int instead. Again, 4 bytes long. Possible values range from -2147483648 to 2147483647. The first bit is used for the positive or negative sign. Positive numbers will look exactly the same as an unsigned int. Negative numbers, however, are stored using a system called two's complement. You don't really have to understand two's complement because Torchlight 2 uses very few negative numbers you might care about. It is worth noting, however, that Torchlight 2 often uses a signed int where it should use an unsigned int, so there's a danger of wrapping into negative numbers when you try to exceed 2147483647.
The Signed/Unsigned Short Int (a.k.a Short)
The same as a signed/unsigned int, only it's 2 bytes long. Possible values are -32768 to 32767 for the signed short, and zero to 65535 for unsigned.
The Signed/Unsigned Long Int (a.k.a. Long)
Exactly the same as a signed/unsigned int.
The Signed/Unsigned Long Long Int (a.k.a. Long Long)
The same as a signed/unsigned int, only it's 8 bytes long. Possible values are -(263) to 263 - 1 for the signed long long, and zero to 264 - 1 for unsigned. Torchlight 2 uses these for GUIDs, which occur reasonably often.
What happens when your number needs a decimal point? Enter the float, or floating point decimal number. The representation of floating point numbers is, well, very complicated. Probably the best approach is to learn to recognize potential floats by the fact that, unlike ints, their high bytes are never empty, and then plug things that you think might be floats into a calculator (remember to correct for little endianness) and see if those bytes constitute a float with a sane value that Torchlight 2 might be using. Fortunately, it seems that Torchlight 2 uses floats exceedingly rarely. (The only two I'm presently aware of are current hp and current mp.) There are also two 8-byte variants on the float, which are not of interest to us here.
The Signed/Unsigned Char
A single byte. It can represent a number, in which case possible values are -128 to 127 for the signed char and zero to 255 for unsigned. It can also represent an alphanumeric character in ASCII format.
There are several different ways to display a string of text. Torchlight 2 happens to use the UTF-16LE format. The details of this format are as follows: First there is an unsigned(?) short int that specifies how many characters are in the string. (Note that this short will still be present as 0x0000 if there's an empty string.) Then the characters in the string follow left-to-right as unicode characters, two bytes each. Because the least-significant byte of unicode is the same as ASCII, and because Torchlight 2 is all in English and doesn't use the extended characters in the most-significant byte, this will always look identical to an ASCII character followed by 0x00.
A bool, short for Boolean, value is a binary value that's either true or false. Any data type can be used to represent a bool, but the most common are char and int. (The C++ standard is char, and that's what Torchlight 2 uses.) Regardless of which structure is used, zero means "false" and anything else means "true."
Step 5: Finding What You Want to Change
So, now we know how to understand what we're seeing in the hex editor. But our goal isn't reading comprehension. Our goal is changing the values of certain variables to be what we want them to be. How do we find a particular variable that we want to change?
There's no hard and fast rules here, but here's some useful guidelines:
1. Remember that the programer is a human being. He or she probably put things in an order that was logical to him or her. For example, you will find the int for Dexterity immediately following the int for Strength.
2. Start a search for a given variable by running Torchlight 2 to find out what the current value is. Then make a copy of the save file. Then go back into Torchlight 2 and do something that will change the variable you care about, while changing as little else as possible. Then make another copy of the save file. Now look to find a value in the save files that underwent the same change in value as the variable in the game. You will probably find more than one. Repeat this as necessary until you can whittle it down to a few possibilities.
Once you're down to a few possibilities, try changing one and see what happens. If you guessed right, the desired variable will change. If you guessed wrong, either you'll get a crash, or the wrong variable will change. Restore from backup and try a different one.
3. Use the 0xFF "dividers" as landmarks. Back in the old days (say, for example, working with SNES save states) everything had a fixed length, and you could always reference something by its offset from the first byte in the file. Sadly, you cannot do this with Torchlight 2 save files. Strings can have arbitrary lengths and are not the same from file to file. Various things (for example skills and items) come in arbitrary amounts and you need to find a counter that says how many of them there are. (Usually it's immediately preceeding the first one.) Generally the best way to work around this problem is to locate a block of several 0xFF in a row. It's not clear if these are deliberate "dividers" (not data), or uninitialized fields that default to -1. In either case, many of them are always the same in every file, every time. Often the best way to reference something is by its offset from the last block of 0xFF or backwards offset from the next block. (Just be careful about data structures that might result in a meaningful 0xFF byte adjacent to a block of unchanging 0xFF.)
4. The file spec is a useful map of where a lot of things are. Before spending a lot of time searching for something, check the file spec to see if it's already been found.