Complexity — We Don’t Know Anything Yet?
This is Part 6 of Series: “How To Explain Complexity Theory To Your Buddy”. You can check my all posts here:
Stephen Hawking has once said: “21st century will be the century of complexity”. All in all, it is still a theory, and it is true until proven false. However, we can definitely be sure that it improves in the right direction.
The advances in computer science has led to many interesting explorations but I want to point out the 2 important problems I find the most interesting:
- The Halting Problem
- P != NP
I will keep this post as short as possible and you can do your research, I’ll leave here some amazing discoveries:
The Halting Problem
Philosophy:
- You cannot write a code which has total control over itself. You cannot write a code which makes a statement about its performance / output / internal feature or satisfies such criteria.
Explanation:
- The question is, if the halting problem is decidable or not. This is strongly believed to be undecidable. You can find very clever proofs on the internet that a computer science with solved halting problem wouldn’t work.
- We assume that a Turing Machine accepts only / halts at its own binary code. This problem is recognizable, since the Turing Machine would recognize its own binary code and halt / accept. But we wouldn’t have a proof that this Turing Machine wouldn’t accept other words, therefore it is not decidable.
- In simpler terms: When a self-restricting statement like “TM accepts exact 5 words”, “TM halts on 0101”, “TM doubles the input” is the main criteria of a language, which TM operates on, it is most likely undecidable.
Otherwise Consequences
- We could write a Turing Machine which decides whether its own binary code is given.
- We could see that whether an even number is always equal to the sum of two prime numbers (Goldbach Conjecture).
- We could find out that the total number of computable functions grows faster than natural numbers in the edge of infinity (Normally, according to Church-Turing Thesis, the number of computable functions grows as fast as natural numbers).
- We might have to redefine the concept of infinity in the mathematics.
Quines and Halting Problem
Probably you have seen programs, which print themselves, called quines. There is an interesting connection between quines and Halting Problem: Halting Problem indicates a code cannot control itself, but quines can print themselves; does a quine having “control” over itself solve the Halting Problem?
Programming codes can be represented by very large integers, that’s why the execution of programs can be seen as an integer function q where:
- n is the source code integer
- q(n) is the output code integer
Programming languages have different features and this function varies from language to language. In case of quine, the integers n and q(n) are coincidentally the same. This is actually not a function controlling its own output, it is a coincidence where the source code and the output code is the same (n = q(n)). So, this makes the Halting Problem still unsolvable.
P != NP
Philosophy:
- If you don’t know about the whole system, you cannot find an optimal solution, as fast as, someone who knows the whole system.
Explanation:
- You cannot really find an optimal solution to a problem in a system, where you have incomplete knowledge about and need to ask your neighbors who ask their neighbors …, as fast as you know the whole system and find an optimal solution to a problem.
- Let’s assume we want to hard-code outcome of every circumstance, in other words we write down every “information” and we can access everything in polynomial time. Theoretically, we cannot know everything, the number of information grows to the infinity much faster than natural numbers, which is why implementing an efficient algorithm is still the most valid way for proving P = NP.
Otherwise Consequences
- We could find the shortest path in Traveling Salesman Problem in a reasonable time without having visited every road.
- We could implement a very efficient algorithm for any game with single-player where it searches for a possible solution as fast as a bot, which played all combinations of game previously and can tell the valid outcomes given circumstances.
- We could find the most perfect boy- or girlfriend in a reasonable time without knowing everybody personally around the world, etc.
- We could prove there is a polynomial upper time bound for many pseudopolynomial problems such as Knapsack and Subset Sum, which means those problems take only polynomial time regardless of the magnitude of the values.
P = NP: Not Yet But Might Be?
SAT is the first problem proven to be NP-complete and the most efficient algorithms known don’t prove P = NP. Therefore we search for other NP-complete problems starting from SAT and try to find an efficient algorithm which proves P = NP.
Let’s see an example below:
Pseudopolynomial vs Polynomial: What Is The Difference?
Given the input array A has n elements and largest number in A is equal to m. Let’s say we want to sort this array based on two different approaches, Insertion Sort and Count Sort:
- Insertion Sort has a time complexity O(n²) where n is the size of the array
- Count Sort has a time complexity O(n + m) where n is the size of the array and m is the largest value in that array
Which one is better?
- If the largest value m has an upper limit, which is — on the edge of infinity — smaller than n², Count Sort is better, because the time complexity of Count Sort would be guaranteed lower than O(n²).
- If the largest value m is always a multiple of n² on the edge of infinity, Count Sort and Insertion Sort slightly differ, because the time complexity of Count Sort would be quadratic too.
- Otherwise, Count Sort is absolutely worse than Insertion Sort, when the largest value is arbitrarily bigger than a multiple of n² similarly. Let’s say m = c n⁹: Then the time complexity of Count Sort would be O(n⁹).
Although Count Sort seems tempting at some point with its “linear” complexity, it is not explainable to the end user why it takes so long at worse cases. It is a very bad behaviour for a program to have an “arbitrarily bad” worst case, especially when we don’t see the numbers at all, for example in a machine-level code.
Is Pseudopolynomial Complexity Good Enough for P = NP?
Subset Sum, which is a NP-complete problem, has a pseudopolynomial complexity, which means that the upper time bound depends on the magnitude of input. This is considered as a huge problem, when taking general use cases into account, as we discussed above. So the answer is no, the pseudopolynomial is not enough (yet).
Is There A Hope?
At first sight, it might be very similar to polynomial complexity, but it isn’t, when describing the worst case in the edge of infinity: The maximum value can be arbitrarily big and it is believed under assumption P != NP that there is no equivalent problem with polynomial upper time bound.
This improvement gives us still hope that such an algorithm may be out there about to be found out. An important example is 2-SAT: Unlike variations like SAT and 3-SAT, 2-SAT can be solved in polynomial time. The reason that there is no reduction from 3-SAT to 2-SAT is that the number of clauses in 2-SAT grows exponentially, so there is no way to store the meaningful information in 2-SAT efficiently.
How To Distinguish an Unrealistic Goal?
- Sorting: Naive approach is O(n²). Most efficient approach for any list is O(n log (n)) according to the Master Theorem. There are algorithms proven to be O(n log (log (n))) working in specific situations. Sorting in O(n) as fast as reading could work only if the list is almost sorted (max. a constant number of elements needs to be moved), but then it would not be a sorting algorithm anymore.
- Sum: Naive approach is O(n). Most efficient approach is O(log n), according to the Master Theorem. The only possibility of O(1) would be to guess the number without knowing every input, but we know that it is unrealistic, it is not a clever idea to try to find such a sum algorithm.
- Matrix Multiplication: Naive approach is O(n³). Multiplicating two matrices in O(n²) as fast as reading both of them seems unrealistic. Most efficient approach yet is O(n^{2.37}). We still believe there might be a more efficient time bound between O(n^{2.37}) and O(n²) which is realistic, but trying to find an algorithm operating in O(n²) would be probably waste of time.
Final Words
Hereby I conclude my 6-part series “How to Explain Complexity Theory To Your Buddy”. The name of the series has a 100% real background story, I was explaining it to a friend who is deeply interested in the theory of computer science. Then I came to the realization that if I cannot explain it with simpler terms, that means I know nothing about the topic, it is also called Feynman Method.
This series contains all my findings and thoughts, but if you learn it seriously, you still need to do your own research. You might think I made a mistake somewhere or just think something is controversial. I thank all of you who give an honest feedback to help me write a better version.