+<title>Gentoo Linux Documentation
+ Integrity - Introduction and Concepts</title>
Integrity - Introduction and Concepts
+<p class="chaphead"><a name="doc_chap1"></a><span class="chapnum">1.
+ </span>It is about trust</p>
+<p class="secthead"><a name="doc_chap1_sect1">Introduction</a></p>
+Integrity is about trusting components within your environment, and in our case
+the workstations, servers and machines you work on. You definitely want to be
+certain that the workstation you type your credentials on to log on to the
+infrastructure is not compromised in any way. This "trust" in your environment
+is a combination of various factors: physical security, system security patching
+process, secure configuration, access controls and more.
+Integrity plays a role in this security field: it tries to ensure that the
+systems have not been tampered with by malicious people or organizations. And
+this tamperproof-ness extends to a wide range of components that need to be
+validated. You probably want to be certain that the binaries that are ran (and
+libraries that are loaded) are those you built yourself (in case of Gentoo) or
+were provided to you by someone (or something) you trust. And that the Linux
+kernel you booted (and the modules that are loaded) are those you made, and not
+someone else.
+Most people trust themselves and look at integrity as if it needs to prove that
+things are still as you've built them. But to support this claim, the systems you
+use to ensure integrity need to be trusted too: you want to make sure that
+whatever system is in place to offer you the final yes/no on the integrity only
+uses trusted information (did it really validate the binary) and services (is it
+not running on a compromised system). To support these claims, many ideas,
+technologies, processes and algorithms have passed the review.
+In this document, we will talk about a few of those, and how they play in the
+Gentoo Hardened Integrity subprojects' vision and roadmap.
+<p class="chaphead"><a name="doc_chap2"></a><span class="chapnum">2.
+ </span>Hash results</p>
+<p class="secthead"><a name="doc_chap2_sect1">Algorithmically validating a file's content</a></p>
+Hashes are a primary method for validating if a file (or other resource) has
+not been changed since it was first inspected. A hash is the result of a
+mathematical calculation on the content of a file (most often a number or
+ordered set of numbers), and exhibits the following properties:
+ <li>
+ The resulting number is represented in a <span class="emphasis">small (often fixed-size) length</span>.
+ This is necessary to allow fast verification if two hash values are the same
+ or not, but also to allow storing the value in a secure location (which is,
+ more than often, much more restricted in space).
+ </li>
+ <li>
+ The hash function always <span class="emphasis">returns the same hash</span> (output) when the file it
+ inspects has not been changed (input). Otherwise it'll be impossible to
+ ensure that the file content hasn't changed.
+ </li>
+ <li>
+ The hash function is fast to run (the calculation of a hash result does not
+ take up too much time or even resources). Without this property, it would
+ take too long to generate and even validate hash results, leading to users
+ being malcontent (and more likely to disable the validation alltogether).
+ </li>
+ <li>
+ The hash result <span class="emphasis">cannot be used to reconstruct</span> the file. Although this is
+ often seen as a result of the first property (small length), it is important
+ because hash results are often also seen as a "public validation" of data
+ that is otherwise private in nature. In other words, many processes relie on
+ the inability of users (or hackers) to reverse-engineer information based on
+ its hash result. A good example are passwords and password databases, which
+ <span class="emphasis">should</span> store hashes of the passwords, not the passwords themselves.
+ </li>
+ <li>
+ Given a hash result, it is near impossible to find another file with the
+ same hash result (or to create such a file yourself). Since the hash result
+ is limited in space, there are many inputs that will map onto the same
+ hash result. The power of a good hash function is that it is not feasible to
+ find them (or calculate them) except by brute force. When such a match is
+ found, it is called a <span class="emphasis">collision</span>.
+ </li>
+Compared with checksums, hashes try to be more cryptographically secure (and as
+such more effort is made in the last property to make sure collisions are very
+hard to obtain). Some even try to generate hash results in a way that the
+duration to calculate hashes cannot be used to obtain information from the data
+(such as if it contains more 0s than 1s, etc.)
+<p class="secthead"><a name="doc_chap2_sect2">Hashes in integrity validation</a></p>
+Integrity validation services are often based on hash generation and validation.
+Tools such as <a href="http://www.tripwire.org/">tripwire</a> or <a href="http://aide.sourceforge.net/">AIDE</a> generate hashes of files and
+directories on your systems and then ask you to store them safely. When you want
+the integrity of your systems checked, you provide this information to the
+program (most likely in a read-only manner since you don't want this list to
+be modified while validating) which then recalculates the hashes of the files
+and compares them with the given list. Any changes in files are detected and can
+be reported to you (or the administrator).
+A popular hash functions is SHA-1 (which you can generate and validate using the
+<span class="code" dir="ltr">sha1sum</span> command) which gained momentum after MD5 (using <span class="code" dir="ltr">md5sum</span>)
+was found to be less secure (nowadays collisions in MD5 are easy to generate).
+SHA-2 also exists (but is less popular than SHA-1) and can be played with using
+the commands <span class="code" dir="ltr">sha224sum</span>, <span class="code" dir="ltr">sha256sum</span>, <span class="code" dir="ltr">sha384sum</span> and
+<span class="code" dir="ltr">sha512sum</span>.
+<a name="doc_chap2_pre1"></a><table class="ntable" width="100%" cellspacing="0" cellpadding="0" border="0">
+<tr><td bgcolor="#7a5ada"><p class="codetitle">Code Listing2.1: Generating the SHA-1 sum of a file</p></td></tr>
+<tr><td bgcolor="#eeeeff" align="left" dir="ltr"><pre>
+~$ <span class="code-input">sha1sum ~/Downloads/pastie-4301043.rb</span>
+6b9b4e0946044ec752992c2afffa7be103c2e748 /home/swift/Downloads/pastie-4301043.rb
+<p class="secthead"><a name="doc_chap2_sect3">Hashes are a means, not a solution</a></p>
+Hashes, in the field of integrity validation, are a means to compare data and
+integrity in a relatively fast way. However, by itself hashes cannot be used to
+provide integrity assurance towards the administrator. Take the use of
+<span class="code" dir="ltr">sha1sum</span> by itself for instance.
+You are not guaranteed that the <span class="code" dir="ltr">sha1sum</span> application behaves correctly
+(and as such has or hasn't been tampered with). You can't use <span class="code" dir="ltr">sha1sum</span>
+against itself since malicious modifications of the command can easily just
+return (print out) the expected SHA-1 sum rather than the real one. A way to
+thwart this is to provide the binary together with the hash values on read-only
+But then you're still not certain that it is that application that is executed:
+a modified system might have you think it is executing that application, but
+instead is using a different application. To provide this level of trust, you
+need to get insurance from a higher-positioned, trusted service that the right
+application is being ran. Running with a trusted kernel helps here (but might
+not provide 100% closure on it) but you most likely need assistance from the
+hardware (we will talk about the Trusted Platform Module later).
+Likewise, you are not guaranteed that it is still your file with hash results
+that is being used to verify the integrity of a file. Another file (with
+modified content) may be bind-mounted on top of it. To support integrity
+validation with a trusted information source, some solutions use HMAC digests
+instead of plain hashes.
+Finally, checksums should not only be taken on file level, but also its
+attributes (which are often used to provide access controls or even toggle
+particular security measures on/off on a file, such as is the case with PaX
+markings), directories (holding information about directory updates such
+as file adds or removals) and privileges. These are things that a program like
+<span class="code" dir="ltr">sha1sum</span> doesn't offer (but tools like AIDE do).
+<p class="chaphead"><a name="doc_chap3"></a><span class="chapnum">3.
+ </span>Hash-based Message Authentication Codes</p>
+<p class="secthead"><a name="doc_chap3_sect1">Trusting the hash result</a></p>
+In order to trust a hash result, some solutions use HMAC digests instead. An
+HMAC digest combines a regular hash function (and its properties) with a
+a secret cryptographic key. As such, the function generates the hash of the
+content of a file together with the secret cryptographic key. This not only
+provides integrity validation of the file, but also a signature telling the
+verification tool that the hash was made by a trusted application (one that
+knows the cryptographic key) in the past and has not been tampered with.
+By using HMAC digests, malicious users will find it more difficult to modify
+code and then present a "fake" hash results file since the user cannot reproduce
+the secret cryptographic key that needs to be added to generate this new hash
+result. When you see terms like <span class="emphasis">HMAC-SHA1</span> it means that a SHA-1 hash
+result is used together with a cryptographic key.
+<p class="secthead"><a name="doc_chap3_sect2">Managing the keys</a></p>
+Using keys to "protect" the hash results introduces another level of complexity:
+how do you properly, securely store the keys and access them only when needed?
+You cannot just embed the key in the hash list (since a tampered system might
+read it out when you are verifying the system, generate its own results file and
+have you check against that instead). Likewise you can't just embed the key in
+the application itself, because a tampered system might just read out the
+application binary to find the key (and once compromised, you might need to
+rebuild the application completely with a new key).
+You might be tempted to just provide the key as a command-line argument, but
+then again you are not certain that a malicious user is idling on your system,
+waiting to capture this valuable information from the output of <span class="code" dir="ltr">ps</span>, etc.
+Again rises the need to trust a higher-level component. When you trust the
+kernel, you might be able to use the kernel key ring for this.
+<p class="chaphead"><a name="doc_chap4"></a><span class="chapnum">4.
+ </span>Using private/public key cryptography</p>
+<p class="secthead"><a name="doc_chap4_sect1">Validating integrity using public keys</a></p>
+One way to work around the vulnerability of having the malicious user getting
+hold of the secret key is to not rely on the key for the authentication of the
+hash result in the first place when verifying the integrity of the system. This
+can be accomplised if you, instead of using just an HMAC, you also encrypt HMAC
+digest with a private key.
+During validation of the hashes, you decrypt the HMAC with the public key (not
+the private key) and use this to generate the HMAC digests again to validate.
+In this approach, an attacker cannot forge a fake HMAC since forgery requires
+access to the private key, and the private key is never used on the system to
+validate signatures. And as long as no collisions occur, he also cannot reuse
+the encrypted HMAC values (which you could consider to be a replay attack).
+<p class="secthead"><a name="doc_chap4_sect2">Ensuring the key integrity</a></p>
+Of course, this still requires that the public key is not modifyable by a
+tampered system: a fake list of hash results can be made using a different
+private key, and the moment the tool wants to decrypt the encrypted values, the
+tampered system replaces the public key with its own public key, and the system
+is again vulnerable.
+<p class="chaphead"><a name="doc_chap5"></a><span class="chapnum">5.
+ </span>Trust chain</p>
+<p class="secthead"><a name="doc_chap5_sect1">Handing over trust</a></p>
+As you've noticed from the methods and services above, you always need to have
+something you trust and that you can build on. If you trust nothing, you can't
+validate anything since nothing can be trusted to return a valid response. And
+to trust something means you also want to have confidence that that system
+itself uses trusted resources.
+For many users, the hardware level is something they trust. After all, as long
+as no burglar has come in the house and tampered with the hardware itself, it is
+reasonable to expect that the hardware is still the same. In effect, the users
+trust that the physical protection of their house is sufficient for them.
+For companies, the physical protection of the working environment is not
+sufficient for ultimate trust. They want to make sure that the hardware is not
+tampered with (or different hardware is suddenly used), specifically when that
+company uses laptops instead of (less portable) workstations.
+The more you don't trust, the more things you need to take care of in order to
+be confident that the system is not tampered with. In the Gentoo Hardened
+Integrity subproject we will use the following "order" of resources:
+ <li>
+ <span class="emphasis">System root-owned files and root-running processes</span>. In most cases
+ and most households, properly configured and protected systems will trust
+ root-owned files and processes. Any request for integrity validation of
+ the system is usually applied against user-provided files (no-one tampered
+ with the user account or specific user files) and not against the system
+ itself.
+ </li>
+ <li>
+ <span class="emphasis">Operating system kernel</span> (in our case the Linux kernel). Although some
+ precautions need to be taken, a properly configured and protected kernel can
+ provide a higher trust level. Integrity validation on kernel level can offer
+ a higher trust in the systems' integrity, although you must be aware that
+ most kernels still reside on the system itself.
+ </li>
+ <li>
+ <span class="emphasis">Live environments</span>. A bootable (preferably) read-only medium can be
+ used to boot up a validation environment that scans and verifies the
+ integrity of the system-under-investigation. In this case, even tampered
+ kernel boot images can be detected, and by taking proper precautions when
+ running the validation (such as ensuring no network access is enabled from
+ the boot up until the final compliance check has occurred) you can make
+ yourself confident of the state of the entire system.
+ </li>
+ <li>
+ <span class="emphasis">Hypervisor level</span>. Hypervisors are by many organizations seen as
+ trusted resources (the isolation of a virtual environment is hard to break
+ out of). Integrity validation on the hypervisor level can therefor provide
+ confidence, especially when "chaining trusts": the hypervisor first
+ validates the kernel to boot, and then boots this (now trusted) kernel which
+ loads up the rest of the system.
+ </li>
+ <li>
+ <span class="emphasis">Hardware level</span>. Whereas hypervisors are still "just software", you
+ can lift up trust up to the hardware level and use the hardware-offered
+ integrity features to provide you with confidence that the system you are
+ about to boot has not been tampered with.
+ </li>
+In the Gentoo Hardened Integrity subproject, we aim to eventually support all
+these levels (and perhaps more) to provide you as a user the tools and methods
+you need to validate the integrity of your system, up to the point that you
+trust. The less you trust, the more complex a trust chain might become to
+validate (and manage), but we will not limit our research and support to a
+single technology (or chain of technologies).
+Chaining trust is an important aspect to keep things from becoming too complex
+and unmanageable. It also allows users to just "drop in" at the level of trust
+they feel is sufficient, rather than requiring technologies for higher levels.
+For instance:
+ <li>
+ A hardware component that you trust (like a <span class="emphasis">Trusted Platform Module</span>
+ or a specific BIOS-supported functionality) verifies the integrity of the
+ boot regions on your disk. When ok, it passes control over to the
+ bootloader.
+ </li>
+ <li>
+ The bootloader now validates the integrity of its configuration and of the
+ files (kernel and initramfs) it is told to boot up. If it checks out, it
+ boots the kernel and hands over control to this kernel.
+ </li>
+ <li>
+ The kernel, together with the initial ram file system, verifies the
+ integrity of the system components (and for instance SELinux policy) before
+ the initial ram system changes to the real system and boots up the
+ (verified) init system.
+ </li>
+ <li>
+ The (root-running) init system validates the integrity of the services it
+ wants to start before handing over control of the system to the user.
+ </li>
+An even longer chain can be seen with hypervisors:
+ <li>
+ Hardware validates boot loader
+ </li>
+ <li>
+ Boot loader validates hypervisor kernel and system
+ </li>
+ <li>
+ Hypervisor validates kernel(s) of the images (or the entire images)
+ </li>
+ <li>
+ Hypervisor-managed virtual environment starts the image
+ </li>
+ <li>
+ ...
+ </li>
+<p class="secthead"><a name="doc_chap5_sect2">Integrity on serviced platforms</a></p>
+Sometimes you cannot trust higher positioned components, but still want to be
+assured that your service is not tampered with. An example would be when you are
+hosting a system in a remote, non-accessible data center or when you manage an
+image hosted by a virtualized hosting provider (I don't want to say "cloud"
+here, but it fits).
+In these cases, you want a level of assurance that your own image has not been
+tampered with while being offline (you can imagine manipulating the guest image,
+injecting trojans or other backdoors, and then booting the image) or even while
+running the system. Instead of trusting the higher components, you try to deal
+with a level of distrust that you want to manage.
+Providing you with some confidence at this level too is our goal within the
+Gentoo Hardened Integrity subproject.
+<p class="secthead"><a name="doc_chap5_sect3">From measurement to protection</a></p>
+When dealing with integrity (and trust chains), the idea behind the top-down
+trust chain is that higher level components first measure the integrity of the
+next component, validate (and take appropriate action) and then hand over
+control to this component. This is what we call <span class="emphasis">protection</span> or
+<span class="emphasis">integrity enforcement</span> of resources.
+If the system cannot validate the integrity, or the system is too volatile to
+enforce this integrity from a higher level, it is necessary to provide a trusted
+method for other services to validate the integrity. In this case, the system
+<span class="emphasis">attests</span> the state of the underlying component(s) towards a third party
+service, which <span class="emphasis">appraises</span> this state against a known "good" value.
+In the case of our HMAC-based checks, there is no enforcement of integrity of
+the files, but the tool itself attests the state of the resources by generating
+new HMAC digests and validating (appraising) it against the list of HMAC digests
+it took before.
+<p class="chaphead"><a name="doc_chap6"></a><span class="chapnum">6.
+ </span>An implementation: the Trusted Computing Group functionality</p>
+<p class="secthead"><a name="doc_chap6_sect1">Trusted Platform Module</a></p>
Page updated July 30, 2012
+<tr><td class="topsep" align="left"><p class="alttext"><b>Summary: </b>
+Integrity validation is a wide field in which many technologies play a role.
+This guide aims to offer a high-level view on what integrity validation is all
+about and how the various technologies work together to achieve a (hopefully)
+more secure environment to work in.
Sven Vermeulen
