Documenting .lin format used for some console titles
#134
Replies: 22 comments
-
|
I forgot that I have the export load order defined already all the way up to the main menu load: https://gist.github.com/landaire/c1f516c43e0dd4dbc5c40029ff73d923 The data is serialized in this order: properties = [class_index, super_index, package_index, object_name, object_flags, serial_size, serial_offset]
ida_kernwin.msg("Export data: " + ",".join(hex(n) for n in properties) +"\n") |
Beta Was this translation helpful? Give feedback.
-
|
My blog post: https://landaire.net/a-file-format-uncracked-for-20-years/ I guess to actually succinctly describe For example, the "normal" representation of a byte sequence may be: 0xA, 0xB, 0xC, 0xD Maybe the engine read the first two bytes and does some logic to determines it needs to read the other two as part of it too. So they do a two-byte read, reset position, then a 4-byte read creating this sequence of bytes read: 0xA, 0xB, [seek - 2] 0xA, 0xB, 0xC, 0xD Resulting in the following bytes being written to the LIN file: 0xA, 0xB, 0xA, 0xB, 0xC, 0xD Proper static parsing of LIN files would require matching the exact load behaviors of the engine. |
Beta Was this translation helpful? Give feedback.
-
|
Great development, thanks for sharing this with all of us! I had intended to look at this myself, a while ago, but couldn't make much sense of it myself, it does seem that this format also appears to be used by the PS2 version of UT99 as well (no idea if the format differs) |
Beta Was this translation helpful? Give feedback.
-
|
I've tried a variety of things to cut corners to avoid having to actually parse the format completely but keep hitting roadblocks. I think the only path forward is to re-implement parsing exactly as the game engine does down to the seek behavior and use information about load order from the title to guide unpacking. For my most recent attempt I've written a QEMU plugin for the OG Xbox XEMU emulator, recorded I/O operations, and I'm actually able to use this information to rewrite the files with correct lengths/offsets using my own program. This approach hits some snags though with some types that use lazy arrays, such as This weekend I'm going to play around with reimplementing a lot of the deserializers in UELib into my own custom program with a focus on ensuring exact 1:1 I/O behavior and seeing how far that gets me. I don't know how much effort it would take to make UE Lib support linking at load time and loading exports like this, so for now I'm going to experiment within my own codebase for rapid prototyping. |
Beta Was this translation helpful? Give feedback.
-
Those damn lazy arrays are a curse :) Why'd they use an absolute file offset there?
I do have some unpushed work towards this (actually working), perhaps this may be of use to you? Nonetheless, I do recommend checking out the 'Next' branch, a much cleaner, and re-worked stream setup. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
I think you can re-calculate the absolute position there on basis of the See the UELib implementation of this: int elementSize = Unsafe.SizeOf<TElement>();
// ...
// Absolute position in stream
long skipOffset = stream.ReadInt32();
// ...
ElementCount = stream.ReadLength();
// Occurs with SCCT_Versus
if (skipOffset == 0)
{
StorageSize = ElementCount * elementSize;
StorageOffset = stream.AbsolutePosition;
stream.Position += StorageSize;
return;
}
StorageSize = skipOffset - stream.AbsolutePosition;
StorageOffset = stream.AbsolutePosition;
stream.AbsolutePosition = skipOffset;You can assume that |
Beta Was this translation helpful? Give feedback.
-
|
Yeah I noticed after writing that comment that I was using relative offsets, not stream absolute offsets. Fixed that and they load fine in UE Explorer, but not in UPK Explorer. It's rather odd because none of the other exports seem to throw exceptions, so I'm not sure why UPK Explorer is complaining. |
Beta Was this translation helpful? Give feedback.
-
|
Do yourself a favor: don't bother with it. There are no "magic" bytes, only engine logics. A perfect understanding of how Unreal Engine works – just knowing the basics is not enough. Even having the source code of the gam will not solve the reading problem.
The reading order is a combination of processing the ExportMap (real objects) and ImportMap (dependencies) of each package. For example, the first package in a LevelName.lin file will be LevelName.unr, the second will be Engine.u, then Core.u, and so on. LIN stores only the objects necessary to build the level. This means that all content from .utx/.usx/etc. is split into *.lin packages, and all of them must be loaded to recreate the game comtent. |
Beta Was this translation helpful? Give feedback.
-
|
@VendorX yes, I know. Please read my blog post... or even this issue. Currently I can dump data at runtime with some decent success (some data changes between deserialization and re-serialization), or use runtime information to statically rewrite the files. The latter has success except for types which use lazy arrays, which I'm currently debugging.
I have enough information from how the game conducts I/O at runtime to write tests which assert that an external reader processes the data in the exact same way. This is not a technically challenging problem at this point. |
Beta Was this translation helpful? Give feedback.
-
|
Good for you ... and good luck - you will need it. |
Beta Was this translation helpful? Give feedback.
-
|
@EliotVU sorry for the lazy question, but what changes in particular did you think would be useful from the I took a look and I see at least two distinct architectural changes that would need to occur:
I've been experimenting in my own codebase and have managed to get decent parsing of the objects I've defined so far (it's a small subset so far but My approach has been to load dumped runtime metadata (you can find this here but please note this is ~100MB, so maybe just For example, the metadata contains a top-level key This array contains each individual seek/read operation that occurs in the game engine when loading Note: this list does NOT contain seeks/reads from reading the package header. These are only from creating export objects. There is also a top-level key called These are the full names of each item passed to
And finally there's This is a 1:1 mapping of the file name which triggered package load and can be used for reconstructing the filesystem layout. |
Beta Was this translation helpful? Give feedback.
-
Generally it easier to follow through the serialization code, which would be useful for your 1:1 implementation. Additionally, if one were to attempt to implement this format in UELib, the 'Next' branch refactoring of the IUnrealStream implementations attemps to ease the difficulty of processing the underlying data, such as decompressing etc.
Oh I see, so basically a single object's serialized format can be spread sporadically across the package stream? In that case, just directly instructing
Yes, done with the new package linking code UnrealPackageLinker.cs (which is also a prerequisite to make this format work) a.t.m this is disabled, because it results in a stackoverflow on some packages such as UT99, but works fine on UT2004. And furthermore, I'll also publish the code for .UMD (Unreal package module) support to the 'Next' branch. |
Beta Was this translation helpful? Give feedback.
-
Yeah so for instance when deserializing a field the engine deserializes its object index then attempts to load the object. The load will cause a seek + read but since the stream doesn't seek you end up a byte sequence starting with the object A's data, the field's object (object B) index, then object B's data IF it hasn't been loaded yet. Sounds good though. I'll take a look at it this evening and see if I can port some of my stuff over. |
Beta Was this translation helpful? Give feedback.
-
|
The Common packabe isn't a problem, in most of the cases it contain full / empty files with header only. To benefit from export offsets, it must be reconstructed. Different story is with LevelName.lin. TIP: SCCT and SCDA are different (*.ulnc / *.hlnc), which makes it very easy to rebuild the original packet structure. However, there are many 'holes' because all packets were stripped of unused game components after they were shipped. |
Beta Was this translation helpful? Give feedback.
-
I'm not quite sure what you mean. I've only looked at SC1 so that's all I can speak to, but the engine has a list of things that it deterministically loads before loading the Once all of these items are loaded, the engine loads For example, in SC1 the header for This is because the engine loads |
Beta Was this translation helpful? Give feedback.
-
|
The problem is that CreateExport only creates an empty object, which will later (it will be placed on the stack in GObjectLoaded) be loaded by PostLoad. However, there are many factors that can trigger periodically its execution. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
This is a dead end ... and has nothing to do with Preload. Check the EndLoad which is responsible for calling PreLoad. |
Beta Was this translation helpful? Give feedback.
-
I understand. My tooling handles this just fine. |
Beta Was this translation helpful? Give feedback.
-
|
Moving this to discussions. Looking forward to seeing how this progresses. |
Beta Was this translation helpful? Give feedback.
-
|
late to responding to this here, but I did see progress awhile back on .lin in the case of the original Splinter Cell on OG Xbox. Really promising stuff, here's to more support being added for other Ubisoft UE2 titles that use stuff like .lin (even .umd too). |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Over the past couple weeks I've been reverse engineering the
.linfiles used in Splinter Cell for the original Xbox and am currently in the process of writing a blog post about how cursed the format is. I saw some discussion in #104, but I thought that this might warrant a dedicated issue for documenting the format. I'll try to be succinct as not to overwhelm with info, and feel free to ask any clarifying questions for anything. I understand that some of this is known, but just to centralize everything I'll include everything anyways:Some high-level notes about the format:
Each map directory contains at least two files:
common.lincontains a file table and some shared datamapname.lin(likemenu.lin) contains level-specific data.The files (at least for SC1 on OG Xbox) have the following outer
.linstructure:For
common.linin this game, the first 4 independently-compressed zlib blocks are:The
common.linandmapname.linvalues for these fields get added together after both are decoded.The
common.linfile has a file table not present in map-specific.linfiles which looks something like:Note: As far as I can tell the suspected load addresses never change for any
.linfile. The load addresses don't matter anyways -- nothing gets mapped to these addresses at runtime and for reasons explained further down they're irrelevant.The partial file table for SC1 is:
The offsets and lengths in this table are not to be trusted. They are simply wrong. The last entry has an offset of
0x9600F0which is very much outside of the range of thecommon.lin, and for known maps (likemenu.lin) the data doesn't make sense no matter how you spin it. It's not at offset0x0in either file, and it's not0xDEEEbytes long.The offsets and byte sizes in this format do not matter because there are various tricks used to fake the sizes of data read and the file readers cannot seek.
Because the underlying data is compressed, you cannot easily map a decompressed offset to a compressed offset and partially decompress data. The game engine, through reverse engineering, treats a
Seek()as a position property update and the underlying reader'sSeek()is a no-op.The data can only be read going forward and seeking never actually happens.
The file table is only read as a consequence of parsing the data before it, and the file that comes after the file table data is only read as a consequence of it always being the first file that's read. Without knowing the first file that's read, you can carve out all Unreal Engine Package files (magic
0x9E2A83C1), but those will have incorrect exports.The first package in SC1 is
Enginewhich has the following header (your lib parses this fine btw):The described offsets are wrong. The counts are correct. If you just parse the data up to the end of the export table end, everything is fine.
For SC1 the first object that the engine attempts to resolve is
Engine.GameEngine, which forces theSystem\Engine.uscript to be loaded and theGameEngineexport to be parsed. This of course requires resolving its super objects, which triggers a read of its parent typesEngine.Engine,Core.Subsystem(? I think this is inCore), andCore.Class.Since all of these are lazy-loaded, the exact load order is:
Engine.uheader read/parseEngine'sGameEngineexport lookupEngine.Engineobject lookupCore.Subsystemobject lookupCore.uheader read/parseCore.Classobject lookupCore.Classproperty deserializationCore.Subsystemproperty deserializationEngine.Engineproperty deserializationEngine.GameEngineproperty deserializationUpon finishing deserializing an export, the game engine gets the stream position, calculates the delta from the recorded pre-serialization position, and asserts that the read size equals the export's
SerialSize.Some of these properties seem to trigger a seek to do reads (possibly to parse the property class type info?), which you'd think would throw off that logic. But those cases seem to restore their prior position when finishing their work. Since seeks are a nop, this causes the stream to advance but restores the position field used for the above logic, so the deserializer thinks that the accurate amount of data was read even if it wasn't.
This means however that for an export with an offset of
0x0and size of0x1c, the complete deserialization sequence may start at0x0and end at (random number here)0x100depending on what its properties contain, whether or not its parent types have been deserialized yet, and their properties as well.This inherently means that this format is 100% load-order dependent. Successful deserialization of this data depends on examining which order the game itself loads data, or else your exports will be reading garbage data from incorrect positions.
IF this load order were obtained and documented (which I can do), this library might be able to enter into an "ignore seeks, they aren't real" mode that loads the exports in the exact sequence that the game loads them when entering a map. From there you can prune any unloaded exports/imports and statically reserialize the file with correct sizes/offsets. I don't know enough about your implementation to say how difficult this would be in practice.
Note that I've managed to dump some packages from the game by doing some binary patches and calling the serialization routine at runtime. These files do successfully load in UE Explorer.
I'll link the blog post once it's out which will have deeper technical details so that the work can be reproduced.
Beta Was this translation helpful? Give feedback.
All reactions