xsave

Nathaniel
The next one is the xsave crate. Okay, so this crate is specific for x86 for Intel and AMD processors. And basically the way that this works is that a Intel and AMD processors have CPU state and sometimes you want to save that state and restore it or you will sometimes you want to wipe it clean. And there's basically two ways that you can do this. So the first is you have general purpose registers. And you can save those and store those manually just by moving the data around or pushing it on the stack or whatever from the registers. But that code actually exists all the way back, I think, to 386. So it's rather old. Over time, Intel has added additional CPU features, we call these extended CPU features. And so these also have state that you might need to save. So for example, floating point is one of the extended CPU features. And if you wanted to, you know, save the state of the floating point unit and the CPU, you would need to do what's called an xsave, an xsave stands for saving the extended CPU features. The crate itself is named after the Intel x86 instruction. However, exceeding the extended CPU state is actually rather complicated. And so we take what we call a pragmatic approach to having a crate that saves extended CPU state versus a what you might call a correct approach. And the reason for this is that different CPUs have different extended features. And so the amount of memory that you need in order to save the extended CPU state varies from CPU to CPU, a correct implementation of xsave would first call CPU ID, in order to determine how much memory should be allocated for saving CPU state, then you should allocate that amount of memory, and then you should call the xsave instruction and pass the allocated memory to that instruction. Well, the problem with this is that we actually don't want to do that for a variety of reasons. The most important reason is that we want to be able to do xsave and extra store, inside of an enclave in SGX. And in SGX, because you're operating in an inside an enclave and the host is not trusted, we can't trust the host to be honest with us about the size, the amount of memory that should be used for for extended CPU state. So, think of it this way, if I'm running on a host, and the host is untrusted, and I did the CPU info command to ask for what's the size of the extended CPU state area, the host could lie and say, Oh, we definitely need five terabytes of memory for that. And all of a sudden, I'm trying to allocate five terabytes of memory inside my guest. And this could provide opportunities for an attack. The other problem is the problem of allocation, which is that if you, let's say you did get the amount of memory back from CPU info that said how much memory you wanted to allocate to store the extended CPU state, then you have a problem with how do you get this memory? Well, if you're in assembly, this is pretty easy, because you can just allocate this memory on the stack. But we want to be able to do this from Rust in this in a way that's safe and Rust. And Rust doesn't have any primitive, like C, for example, as a language has the alloc call, which is a compiler implemented function where you can allocate memory on the stack. Rust doesn't have anything like that. And for good reason, the alloc call is strongly discouraged from use even in C because it has all sorts of problems. And allocating memory on the stack dynamically like that is fraught with problems. So so we actually don't want to allocate on the stack. But if you're not allocating on the stack, then you need to allocate it from somewhere else. But if you're running inside of a kernel, or if you're running inside a bit of a code where there's no heap allocated, and you need to actually store this state, you run into the problem of where are you actually going to get the memory from. So we have two real problems for a correct implementation. One is that we can't always necessarily trust the CPU info because it could be a lie. And the second problem is that even if we did trust it, we don't know where we're gonna get the memory from to allocate it. And so we have implemented an implementation of xsave, which we call a practical implementation, and the practical implementation works by, we have looked at all of the CPUs, maybe not all of them, but we've looked at a wide number of CPUs, and we have determined what on CPU is the basically for all the features, how much memory is actually required. And then we specified a structure that had even more memory than that, right. So I think we allocate like something like a page, where the the largest CPU, I think, uses like 2000 bytes or something like that. So in other words, we've allocated much more memory than we think we would need, at least for the foreseeable future. And that's the size of the structure. And now we can actually take that structure and we can allocate it directly on the stack, we don't have to have any worries about allocating memory dynamically, we don't have to check CPU info, the only thing that could come back to bite us with this approach is that if Intel released a new CPU with a new CPU feature that used a bunch of memory, and it was larger than ours, that we would get a crash. But we think that this is a reasonable trade off for the security and features that we get. And so really, this is fairly straightforward. There's one type, the type is called xsave. And that's the structure that is appropriately sized to hold all this information. And you can call the default method on it. And the default method will give you the default CPU state. This is useful, for example, if you want to clear this extended CPU state, you can reset it to the CPU default. So in this first example, we clear the extended CPU state by calling a default function and then by calling load on the resulting structure. And that loads the extended CPU state. Same thing for saving and restoring an extended CPU state, we can call default in order to just get an instance of it. And then we can call save, to save the state, and then we can load it back again. And, and this all works. And it's actually safe Rust, right? As long as our assumption about the size of the buffer doesn't turn out to be violated, everything is safe. Any questions about xsave?

Nathaniel Yeah, sure. So this is what the code actually looks like. So the xsave structure is actually comprised of, so this is the xsave structure. And the extended bit is where we were making that estimated size, right. So if you look at xsave, extend, here, this is a private structure, we don't provide any access to any of its members. And this is where we've allocated what we think is a reasonable amount of space, that protects us from future problems. But then there's also the xsave header and xsave legacy. And this is because in some of the main features, we want to actually expose the ability to, to read and write this information. And so the xsave header, for example, it gets this information. And this basically just determines what features have actually been saved in the buffer, you can mostly just ignore these because most of the time, you just care about either clearing the state or you care about saving it and loading it back again. And then the xsave legacy structure contains a bunch of like flags and other states. You can see for example, under xmm, and mmm, these are the MMX instructions. So these are some of the extended CPU state. There's also a variety of flags that are defined here regarding CPU state. Yeah, there's a variety of flags here. So if you wanted to actually introspect this, you know, there's a lot of flags here, you, you can look through the data, you could see what CPU flags were set, and so forth. So it is a fairly complete implementation of xsave, but again, pragmatic in terms of the size of the extended area, which that's this bit here, which should theoretically be done dynamically. But it's just done by over allocating and we hope it's enough. Any other questions?

Nathaniel
So the question, I'm just going to repeat it in case people online can't hear, since Richards is a bit away from the microphone. He was asking about basically architecture specific includes more or less, like, how do you define whether a file is usable on a particular architecture. And you can actually use Rust attributes for this, there is basically any block of code, you can add an attribute at the top and say, like target_arch equals x86. And then you can have multiple of those like one for each architecture you support or so forth. In this case, I don't think we're doing anything to actually defend, the code will just fail to compile, if you try, the assembly in particular, will fail to compile if you try to build it on another platform. Actually, that's not true. Because you can actually there's a feature flag for assembly and you can disable the assembly feature. And then you can actually, this is interesting because you could, for example, on an x86 system, save the CPU state, write it to disk or write it to a network, send it to, I don't know, an Arm machine, and now you can actually parse that same CPU state area on a machine that's not native. So you actually can compile this great on architectures that are not x86.

Nicolas
So we've got assembly feature disabled, you can't actually save a load, you can just basically transmit it

Nathaniel
Correct? Yep. And you can inspect it, right? You can look at what flags are set and so forth.

Paul
So this would be something that would have to be done. For Arm for example, support Realms.

Nathaniel
Arm has totally different primitives. There's not, I don't know that there's any xsave equivalent for Arm. This is basically an x86 feature.

Paul
Yeah, no, I was just thinking if you need to save state on Arm

Nathaniel
Yeah, Arm has its own CPU state mechanisms. I don't know what they are off the top of my head, you can look them up in the Arm documentation. It was for the general purpose registers, it's probably the same thing you just tested, save them all manually, right? It's not Rust code. In fact, it's not even Rust compatible code to try to do saving and restoring of general purpose registers in Rust, or really any compiled language, because the compiled language depends upon an ABI. And anything that you do in that regard has to match the ABI specifications. So you're kind of limited in what you can do, but the extended CPU state, for example, in the System V ABI, you have to basically have a clean CPU state on every function call. So if your function uses one of these extended features, that it's basically it has to preserve the state of the caller. So because that's defined in the ABI, then it's safe for us to do xsave and rest.