Criminal IT: Why insecurity is implicit in computing

It's the way the machines were designed...

By Neil Barrett, 18 May 2005 13:10

COMMENT To explain why antivirus software is so limited and malicious code so hard to detect, Neil Barrett goes back to the very origins of modern computing.

Some statements are undoubtedly true; I am an adult male. Others undoubtedly false; I can breathe underwater. And some of them need more information; I live in a house with a green-tiled bathroom. You can visit my house, you can ask my family; it is decidable, provided that you can get some more information.

There are, however, some statements that are entirely un-decidable; even with extra information - or indeed, with every piece of information possible - these statements cannot be determined to be true or false. The most famous example is: 'This statement is false.' If it's true, it's false; if it's false, it's true. It is a perfectly well-formed, grammatically correct statement for which we cannot assign a true or false value.

These are the sorts of statements to confuse children but they proved to be an anathema to mathematicians, whose purpose in life is to determine whether or not well-formed mathematical expressions are true or false. If statements such as 'this statement is false' have analogues in the language of mathematics, then there are expressions that cannot be decided - and so the mathematicians have set themselves an impossible task. This became important at the start of the 20th Century, when mathematicians such as David Hilbert and Kurt Gödel tried to establish mathematics on the most rigorous possible foundations; they were trying to prove that mathematics was complete and self-consistent.

Unfortunately, they managed instead to prove that it wasn't - but on the road to that discovery, they managed to invent computers, help to win World War II and built our modern 'information age'.

The person who, more than any other, was responsible for laying the foundations of this work was the incomparable genius, Alan Turing. He produced not only an answer to the Decidability Problem but did so with an imaginative technique that showed us how to build automatic mathematicians - computers which first solved the puzzle of the Enigma cipher machines, turning the tide of the U-Boat war, and then solved everything from payroll to rocket launches. He did this by designing what has come to be known as a 'Turing Machine', an idealised version of a mathematician manipulating mathematical symbols - turning statements into alternative statements, converging on an answer to any form of calculation that can be expressed.

The machine has an infinitely long tape containing discrete cells, each of which contains a symbol. It also has a unit which can read or write those symbols, with the position and value of the written symbol determined by a small set of simple rules based on the values read. The unit moves along the tape, reading and altering the symbols so as to change the contents of the tape from the original 'problem' into the final 'answer'. When the final answer is produced, the Turing Machine halts. Any problem for which the machine halts is soluble, or 'decidable'; any for which it doesn't are 'un-decidable'. Turing proved that there were well-expressed mathematical statements for which the machine would never reach the halting state - and so there are indeed un-decidable statements.

The Turing Machine is an enormously powerful concept, providing mathematicians with a simple, mechanistic way of considering their task. But a Turing Machine that is actually constructed is an even more powerful device; it becomes an entirely automatic mathematician, a programmable computer able to perform any calculation. Perhaps the most important step in this direction was taken by Turing's former colleague at Princeton, John von Neumann, who produced the design to implement the machine physically and who gave us the modern computer.

In the language of mathematics, a computer program is a Turing Machine and a computer - a device able to run any Turing Machine - is called a 'Universal Turing Machine', a mathematical model able to interpret any mathematical model. One feature of Turing's work, though, was to show that there are programs which cannot be run to completion - programs which are the analogues of 'this statement is false' and cannot therefore be decided.

What does this mean for information security? In essence, we want to know in advance whether a given sequence of changes to data - a program - is going to be 'harmful' or not. Unfortunately, in 1986, Fred Cohen managed to prove that the problem of determining whether a piece of program was a virus was indeed an un-decidable problem. A Turing Machine to run the analysis would never halt; the only way to determine the effect of the program was actually to run it and see.

This is an enormously important result. It means the task of the antivirus program is mathematically impossible. By extension, the task of determining in advance whether any program will have a harmful effect is equally impossible. The only way of establishing what a program will in fact do is to allow it to execute and then to look at the state of the computer's memory. If the task of information security is to predict whether a given program will have a harmful effect on the data state, then this is impossible.

The implication is that the mathematical model of computation has insecurity implicit within it because we know that we cannot know ahead of time whether something will or will not be damaging. To borrow a phrase from my colleague Stephen Castell, computers are "ontologically insecure" - whether they are built on the von Neumann or any other architectural model.

Of course, this isn't the final word. Antivirus programs do exist and perform a vitally important task - as do intrusion detection systems, segregated memory models and a host of other solutions. But we should be conscious that each of these solutions can do nothing more than plug the occasional hole in the dyke. For true information security, we need to call on something that isn't bound by the Turing Machine model: we need to step outside the problem and examine it differently; we need to apply the real-world, human perspective to it - a perspective that is fascinated but not foxed by 'this statement is false'.

Comments

There are 10 comments. Join the discussion

  1. 1. John H Woods

    Hey, gizza job!

    Your work sounds fascinating, and your article is extremely interesting. I do take issue with a couple of your points, however.

    On a pedantic note, I think that 'this statement is false' is paradoxical, not undecidable. I think you need something like 'this statement is true but not provable' to demonstrate undecidability. (email argument welcome!)

    Whilst I agree that it is impossible to determine whether a given piece of code is a virus, it is perfectly possible to observe the /behaviour/ of pieces of code and determine which of them are viruses.

    It's pretty much the same story with DNA -- you can only tell genetic material is from a virus because it is similar to genetic material that is known to be from (other) viruses. But the behaviour of the organism/entity in the environment makes it easy to classify it as a virus or not.

    I just wish I had an opportunity to apply my original biological knowledge more effectively to IT. I'm sure that as the complexity of IT systems increases, a biological approach (such as an ecological networks analysis) becomes more and more appropriate to manage and understand the complexities and the emergent behaviours.

    ...John

  2. 2. Neil Barrett

    Re: 'Hey, gizza job'

    Thanks for the comment; people taking issue are always welcomed!

    You're right, of course, that "this statement is true" is paradoxical rather than undecidable. To be (pedantically!) correct, I should have said that we have two sets: the set of all true statements and the set of all false statements. Assigning "this statement is true" to one or other set is then undecidable. But with only 1,000 words to play with, I simplified it!

    You're also right that we have two ways of determining whether a biological virus is harmful. Our immune system can 'recognise' it as an example of a harmful thing previously encountered and then kill it; or we can allow it to live and see whether it does anything harmful. The problem, of course, is that the harmful activity might be permanently damaging to the host body.

    The analogy holds for computation. We can scan the code looking for previously identified sequences of known harmful instructions; or we can execute it and see what it does. Of course, the 'what it does' might then again be permanently damaging, this time to our data set.


    Neil

  3. 3. Neil Barrett

    Part two...

    In the computational approach, there is a third thing which we can do: we can execute it in a controlled environment. The computational equivalent of a laboratory test on a tissue sample.

    Computationally, though, the problem is that the 'controlled environment' is itself a program which is necessarily larger than the tested program. Again, computationally we must satisfy ourselves that the system consisting of the testing and tested programs is not itself harmful - which requires a yet bigger program. And that too needs to be tested with yet another, yet larger program... etc.

    That's the correct argument for non-computability which I rather skimmed over in the article.

    I also skimmed over the 'what is security' argument - I might keep that for another article! Grin.

    As to an email conversation on this, more than happy. My email is neil dot barrett at btinternet dot com. Comments and critiques welcomed...

    Neil

  4. 4. Nick Cole

    This highly relevant and interesting article illustrates how the more complex something is the greater likelihood of a major unforeseen problem occuring.

    We do not help matters by insisting on automating and providing the means of automatic connection outwith the control of the operators of critical functions on our machines. The interdependency of all these items is frightening if thought about. Because we allow programmes to interact with the operating system, hardware and wider network/internet through macros, embedded html, or whatever we create the very conditions that cause these problems.

    The conundrum is that the most secure computer is one that doesn't interact with any other, and possibly with anybody as well! Human nature is a significant factor but so is the deliberate or inadvertent coding of systems that allows remote execution, with the inability of any collective error trapping mechanism to monitor every conceivable fault opportunity.

    If people realised how vulnerable things are they would be more careful.

  5. 5. Mike W

    Surely the point of a Turing machine is not that it is 'powerful', but that it is simple enough for its basic operations and sequences thereof to be completely analysed in a mathematical way, but with enough expressive power that it can compute any required algorithm (eventually !!).

    One can then apply transformations to demonstrate that various 'real' (or conceptual) computing devices are computationally equivalent to a TM, and therefore any theorem proved for a TM is true for any 'real' device.

    In this way, one can make proofs within a very restricted environment, but have them shown more generally valid by dint of the proven equivalence.

  6. 6. Neil Barrett

    I think you just proved that Asimov's Laws of Robotics could never exist?

    PS - I'm not an identity thief - I am another Neil Barrett (born 1959)...

    But it is nice to see 'our' name getting such a high profile!!

  7. 7. Stephen Castell

    As the colleague referred to in Neil’s article, some background on 'legal reliability' of IT. I wrote the APPEAL Report (1991) on admissibility of digital evidence in court, and coined the term 'ontologically insecure' in 'Computers trusted, and found wanting', Computer Law and Security Report, [1993], 9 - 'Dr Stephen Castell reflects on the adequacy of existing technology to deliver legally reliable output and calls for change in systems design' (see also CLSR, [1994], 10, p.158, 'A computer of the simplest kind...').

    When I say ‘ontologically insecure’, I mean ‘as a matter of existence, of actuality’. Von Neumann ‘open architecture’ has made the computer industry able to be constantly and unnervingly innovative; but it is also what makes it ontologically insecure – as a matter simply of its existence, of its very essence. Thus, you can’t expect a computer to be anything else but insecure, and ‘You can’t trust a computer, nor anything that comes out of it’ (including ID cards!). It’s time for a new non-ontologically-insecure non-Von Neumann computer architecture, folks.

    Stephen Castell
    http://www.e-expertwitness.com
    http://www.midentity.com/StephenCastel

  8. 8. anonymous

    I don't trust anyone with green bathroom tiles...

  9. 9. Steve Cowles

    Re: Part 2

    I agree that the "Russian Doll" approach of checking software is madness, but if the checking software is built into the Hardware Layer (i.e Can't be overwritten & therefore trusted) doesn't this stop the infinite validation loop?

  10. 10. Mike Oldman

    "I can breathe underwater" is not false it is incomplete. Didn't somebody invent an artificial gill to extract the small amount of oxygen in water, just like fish do?

Post your comment

In order to post a comment you need to be registered and logged in.

Log in or create your silicon.com account below

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy.

Questions about membership? Find the answers in the Membership FAQ