关于TSP问题的尝试和遐想(Attempts and thoughts on TSP problem)
以TSP为代表的NP问题理论上没有突破,也不会有突破。因为它不是真正的数学问题,所以没啥好研究的。研究的论文铺天盖地,但是并没有什么理论价值。这不是数学理论问题,而比较像是应用数学问题,一个工程问题,甚至是一个软件程序问题。理论讨论没有意义,有价值的仅仅是程序运行效果。
其次,我讨论一下TSP及其相关的一类问题,如货郎担问题,哈密顿环,是不是完全图,对称性,有无权,搜索所有点(即不必回原点)等等。
虽然有种种差别,但为了便于讨论,这里仅仅讨论如下情形:
完全图,同时,非直接连接的两个点(中间经过至少其他一个点)的距离,即两个或更多点距离之和,一定大于这两个点的直接距离。而如果约定是几何距离的概念,无论是二维,是三维,也是确定的。即,如果给与每个点坐标,上述距离的关系一定成立。
但点连接的类TSP问题这一点不一定是必须的。另外,正规的TSP问题不允许重新访问一个点,而符合上述几何概念的对称完全图也没有没有必要回到一个点。但原则上,非完全图,非几何距离概念,也可能允许重复访问。也可以对非完全图做出相应的完全图,一种是将不通两点的距离当作无穷大。另外一种可行性比较好的做法是:如果允许重复访问,那可以把两点(或更多点)之间的距离(之和),当作这两个点的直接距离。但根据笔者的实践,如此将非完全图完全化的处理,减少了信息量,对改进某些算法不利。
此外,对称,有权。
TSP的算法琳琅满目,最主要有遗传,蚁群。但结果都不可重复,随机,每次结果不同,显然都是局优解,陷入其中不可自拔,不满意结果只能从头开始。贪心解法可重复,速度快,但是质量低。穷尽法只能用于点数很少的情况,计算量随点数乘介的增加是其NP问题的原因。
但是如果不能穷尽,TSP原则上无法验证最佳解。好坏只能在最后质量,需要时间等方面在各种方法比较中才显出意义来。一个矛盾是在计算时间和质量之间取得平衡,就是说,通常时间长,质量会好一些。这在实践上时间的限制对质量有了限制。就既可能因为每次结果不同而需要多试几次,但有些运行参数可以帮助取这两方面的平衡和取舍。
因为完全图每一个点都有N-1条连线,在最理想情况下,只有最短(权重最低)的两条被用上。如果最后结果和这个和相等,就完美了。但是对于完全随机产生的数据里,这没有可能。毫无疑问,最优解会分布在同这个和略微大于1的一个高斯分布上。根据我的经验,这个比主要区间在1.1到1.2之间,大体是随着点数,比如从30到3000,而慢慢增加。
笔者的方法是最慢加入法:随机割掉一段,然后以最短的一个点加入,直到最后。这个方法有点类似遗传法,因为结果不一定会比原来的结果好,但你一般总可以找到更好的结果,然后重复这个过程,直到你无论你如何切割,都不能得到更好的结果为止。
这个算法中用到个点之间的距离,这个当然有N(N-1)/2条。如果N够大(比如超过1000),这个计算需要一点时间(如果是比较弱的个人计算机)。但笔者办法得到收敛,而且结果不错,需要的时间也没有多多少。几千点也只需要若干小时。如果几百,那就非常快,若干分钟。
当然,贪心算法只需要若干秒,哪怕几千点,但这个结果很差,只能作为笔者方法的起点。
领英推荐
There is no breakthrough in the NP problem represented by TSP in theory, and there will be no breakthrough. Because it is not a real mathematical problem, there is nothing to study. There are a lot of research papers, but they have no theoretical value. This is not a mathematical theory problem, but more like an applied mathematics problem, an engineering problem, or even a software program problem. Theoretical discussion is meaningless, and only the program running effect is valuable.
Secondly, I will discuss TSP and a class of related problems, such as the peddler problem, Hamiltonian cycle, whether it is a complete graph, symmetry, weight, search for all points (i.e., no need to return to the origin), etc.
Although there are various differences, for the sake of discussion, only the following situations are discussed here:
Complete graph, at the same time, the distance between two points that are not directly connected (through at least one other point in the middle), that is, the sum of the distances of two or more points, must be greater than the direct distance between the two points. And if the agreement is the concept of geometric distance, whether it is two-dimensional or three-dimensional, it is also certain. That is, if each point coordinate is given, the above distance relationship must hold.
But this is not necessarily necessary for point-connected TSP-like problems. In addition, the regular TSP problem does not allow revisiting a point, and the symmetric complete graph that conforms to the above geometric concept does not need to return to a point. But in principle, non-complete graphs and non-geometric distance concepts may also allow repeated visits. It is also possible to make a corresponding complete graph for the non-complete graph. One way is to treat the distance between two non-connected points as infinity. Another more feasible approach is: if repeated visits are allowed, the distance (sum) between two points (or more points) can be regarded as the direct distance between the two points. But according to the author's practice, such a complete treatment of the non-complete graph reduces the amount of information, which is not conducive to the improvement of some algorithms.
In addition, symmetry, right.
TSP has a wide range of algorithms, the most important of which are genetics and ant colonies. But the results are not repeatable, random, and different each time. Obviously, they are all local optimal solutions. You can't get stuck in them. If you are not satisfied with the results, you can only start from the beginning. The greedy solution is repeatable and fast, but the quality is low. The exhaustive method can only be used when there are very few points. The increase in the amount of calculation with the multiplication of the number of points is the reason for its NP problem.
But if it cannot be exhausted, TSP cannot verify the best solution in principle. Good or bad can only be shown in the comparison of various methods in terms of final quality, time required, etc. A contradiction is to strike a balance between computing time and quality, that is, usually the longer the time, the better the quality. In practice, the time limit has a limit on quality. It is possible to need to try several times because the results are different each time, but some operating parameters can help to balance and choose between the two aspects.
Because each point in the complete graph has N-1 connections, in the most ideal case, only the two shortest (lowest weight) ones are used. If the final result is equal to this sum, it is perfect. But for completely randomly generated data, this is impossible. There is no doubt that the optimal solution will be distributed on a Gaussian distribution with a sum slightly greater than 1. According to my experience, this ratio is mainly between 1.1 and 1.2, and generally increases slowly with the number of points, such as from 30 to 3000.
The author's method is the slowest addition method: randomly cut off a section, and then add the shortest point until the end. This method is a bit like genetics, because the result may not be better than the original result, but you can generally find a better result, and then repeat the process until you can't get a better result no matter how you cut.
This algorithm uses the distance between points, which of course has N (N-1) / 2. If N is large enough (for example, more than 1000), this calculation takes a little time (if it is a relatively weak personal computer). But my method converges, and the result is good, and it doesn't take much more time. It only takes a few hours for a few thousand points. If it's a few hundred, it's very fast, a few minutes.
Of course, the greedy algorithm only takes a few seconds, even for a few thousand points, but this result is very poor and can only be used as a starting point for my method.