Spherical Triangle Sampling with only one`Arccos`? call
From IrrlichtBaW/Nabla's GPU path tracing unit test. Fireflies come from EDS*L paths and low (32) spp.

Spherical Triangle Sampling with only one`Arccos` call

NOTE: If anyone know how to force GLSL or at least C++ syntax highlighting in LinkedIn Code Snippets, ping me!

You need to be kind to your GPU, and the GPU will be kind to you.

If you scour the net for implementations of Arvo's 1995 paper about Spherical Triangle Sampling, you will find many implementations, almost every single one unsuitable for use on a GPU.

Why? The sheer amount of calls to `acos`. On even the most powerful GPUs this can easily generate an instruction stream which can have 120 cycles of latency. This includes PBRTv3 and possibly even PBRTv4. My first plug-n-play trial of the algorithm dropped my Samples Per Second by a factor of 5.

TL;DR You only need one `acos` call in the whole Spherical Triangle Sampling routine, and that's only to compute the Solid Angle of a Triangle necessary for knowing its PDF.

This is because we don't need to know anything about the angles except for their sines and cosines, all of which we can get from your high-school trig identities and your still wikipedia-able Spherical Triangle Identities.

Aside from that there are a lot of conditional branches and checking for special cases which if you carefully apply IEEE754 rules, can be completely sidestepped.

You can the following code under the Apache 2.0 license, basically all I ask is that you passively help me find more customers for my consultancy.

/**
Copyright 2020 Mateusz "DevSH" Kielan

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
**/

I decided to be nice and not publish under Share-Alike or something similar, but if you can optimize the routines further (I don't like how many `cross` calls the slerps make), please feel free to drop me a message (also feel free to check out my in-house engine where you have a unit test ready to test triangle sampling).

So here it goes, first we need to turn a normal triangle into a spherical triangle which is relative to some origin. There's no way to sidestep the normalization.

mat3 irr_glsl_shapes_getSphericalTriangle(in mat3 vertices, in vec3 origin)
{
    // the `normalize` cannot be optimized out
    return mat3(normalize(vertices[0]-origin),normalize(vertices[1]-origin),normalize(vertices[2]-origin));
}

Now we can obtain the Solid Angle of the triangle, with only one `acos` call thanks to the application of Spherical Triangle Cosine Law.

// returns solid angle of a spherical triangle
// WARNING: can and will return NAN if one or three of the triangle edges are near zero length
// this function is beyond optimized.
float irr_glsl_shapes_SolidAngleOfTriangle(in mat3 sphericalVertices, out vec3 cos_vertices, out vec3 sin_vertices, out float cos_a, out float cos_c, out float csc_b, out float csc_c)
{    
    // The sides are denoted by lower-case letters a, b, and c. On the unit sphere their lengths are numerically equal to the radian measure of the angles that the great circle arcs subtend at the centre. The sides of proper spherical triangles are (by convention) less than PI
    const vec3 cos_sides = vec3(dot(sphericalVertices[1],sphericalVertices[2]),dot(sphericalVertices[2],sphericalVertices[0]),dot(sphericalVertices[0],sphericalVertices[1]));
    const vec3 csc_sides = inversesqrt(vec3(1.0)-cos_sides*cos_sides);

    // these variables might eventually get optimized out
    cos_a = cos_sides[0];
    cos_c = cos_sides[2];
    csc_b = csc_sides[1];
    csc_c = csc_sides[2];
    
    // Both vertices and angles at the vertices are denoted by the same upper case letters A, B, and C. The angles A, B, C of the triangle are equal to the angles between the planes that intersect the surface of the sphere or, equivalently, the angles between the tangent vectors of the great circle arcs where they meet at the vertices. Angles are in radians. The angles of proper spherical triangles are (by convention) less than PI
    cos_vertices = (cos_sides-cos_sides.yzx*cos_sides.zxy)*csc_sides.yzx*csc_sides.zxy; // using Spherical Law of Cosines
    sin_vertices = sqrt(vec3(1.0)-cos_vertices*cos_vertices);
    
    // the solid angle of a triangle is the sum of its planar vertices' angles minus PI
    return irr_glsl_getArccosSumofABC_minus_PI(cos_vertices[0],cos_vertices[1],cos_vertices[2],sin_vertices[0],sin_vertices[1],sin_vertices[2]);
}

The function returns some of the cosines and sines we'll need later for sample generation.

You get the cosines of the side lengths by simple dot products of normalized spherical triangle vertices' vectors. Because the triangle is defined to be the smallest of 8 possible triangles, we can assume the angles will be between 0 and PI, so a square root can be used to compute the sine.

Instead of using 3 cross products with Graham-Schmidt orthogonalization and then computing the dot product between the planes that hold the great circles which contain the triangle's sides, we apply the Spherical Law of Cosines to get the planar angles at the vertices.

Yes, you can see a problem at this stage, namely that we'll have a huge problem if the sine of any side length is 0 that will lead to +INF or NaN (because of the `inversesqrt` call it will be a NaN). However lets consider when that would be the case, only when the length of any side of the Spherical Triangle approaches 0. I'd like to remark that in that case the solid angle of the triangle will be 0, so we can let the NaN propagate and then the calling code check the result for `isnan` and replace it with 0.

You do not wish to write an `if` statement around the rest of the code in a Monte Carlo renderer as you're guaranteed to diverge on the GPU and probably murder your branch predictor on the CPU. You want predicated stores (or better, execution).

Finally we call a function which has been written especially to implement the identity `acos(acos(A)+acos(B)+acos(C))-PI` for a specific case of all angles being between 0 and PI

// returns `acos(acos(A)+acos(B)+acos(C))-PI` but requires `sinA,sinB,sinC` are all positive
float irr_glsl_getArccosSumofABC_minus_PI(in float cosA, in float cosB, in float cosC, in float sinA, in float sinB, in float sinC)
{
    // sorry about the naming of `something` I just can't seem to be able to give good name to the variables that is consistent with semantics
    const bool something0 = cosA<(-cosB);
    const float cosSumAB = cosA*cosB-sinA*sinB;
    const bool something1 = cosSumAB<(-cosC);
    const bool something2 = cosSumAB<cosC;
    // apply triple angle formula
    const float absArccosSumABC = acos(cosSumAB*cosC-(cosA*sinB+sinA*cosB)*sinC);
    return ((something0 ? something2:something1) ? (-absArccosSumABC):absArccosSumABC)+(something0||something1 ? irr_glsl_PI:(-irr_glsl_PI));
}

Thanks to only having one `acos` call, we're guaranteed the return value will be in the range 0 to 2 times PI. In the context of programming and needing inverse trig functions to be unique mappings, your standard math textbook lies.

No alt text provided for this image

Now I introduce the function that generates the samples on a triangle.

// WARNING: can and will return NAN if one or three of the triangle edges are near zero length
vec3 irr_glsl_sampling_generateSphericalTriangleSample(out float rcpPdf, in mat3 sphericalVertices, in vec2 u)
{
    // for angles between view-to-vertex vectors
    float cos_a,cos_c,csc_b,csc_c;
    // Both vertices and angles at the vertices are denoted by the same upper case letters A, B, and C. The angles A, B, C of the triangle are equal to the angles between the planes that intersect the surface of the sphere or, equivalently, the angles between the tangent vectors of the great circle arcs where they meet at the vertices. Angles are in radians. The angles of proper spherical triangles are (by convention) less than PI
    vec3 cos_vertices,sin_vertices;
    // get solid angle, which is also the reciprocal of the probability
    rcpPdf = irr_glsl_shapes_SolidAngleOfTriangle(sphericalVertices,cos_vertices,sin_vertices,cos_a,cos_c,csc_b,csc_c);

    // this part literally cannot be optimized further
    float negSinSubSolidAngle,negCosSubSolidAngle;
    irr_glsl_sincos(rcpPdf*u.x-irr_glsl_PI,negSinSubSolidAngle,negCosSubSolidAngle);

    const float p = negCosSubSolidAngle*sin_vertices[0]-negSinSubSolidAngle*cos_vertices[0];
    const float q = -negSinSubSolidAngle*sin_vertices[0]-negCosSubSolidAngle*cos_vertices[0];
    
    // we could optimize further, everything up and including to the first slerp, because precision here is just godawful
    float u_ = q - cos_vertices[0];
    float v_ = p + sin_vertices[0]*cos_c;

    const float clamp(((v_*q - u_*p)*cos_vertices[0] - v_) / ((v_*p + u_*q)*sin_vertices[0]), -1.0, 1.0); // TODO: get rid of this clamp (by improving the precision here)

    // the slerps could probably be optimized by sidestepping `normalize` calls and accumulating scaling factors
    vec3 C_s = irr_glsl_slerp_impl_impl(sphericalVertices[0], sphericalVertices[2]*csc_b, cosAngleAlongAC);

    const float cosBC_s = dot(C_s,sphericalVertices[1]);
    const float cosAngleAlongBC_s = 1.0+cosBC_s*u.y-u.y;

    return irr_glsl_slerp_impl_impl(sphericalVertices[1], C_s*inversesqrt(1.0-cosBC_s*cosBC_s), cosAngleAlongBC_s);
}

We apply sine and cosine angle sum formulas to avoid having to know the solid angle of the triangle for the computation of the first sub-triangle's solid angle derived quantities (`p` and `q`) given the first random variable.

The only source of NaN can be the `SolidAngleOfTriangle` function, but it cannot produce values less than 0 or greater than 2 times PI. We don't need to worry about sample directions produced when solid angle is a NaN, this indicates a zero solid angle, therefore a zero light contribution if we specify the triangle's radiance, the correct PBR unit for area lights.

Because we always use trig identities, our values can never be nonsense, even in the presence of IEEE754 rounding errors or the 14 bit precision of `inversesqrt` and `sqrt`.

However we need to be careful with the cosines of rotation angles we pass to our slerp because there's a sqrt involved (half angle formula).

The cherry-on-top is our optimized slerp function which will perform a slerp from the first argument to the second, only knowing the cosine of the angle we need to rotate by, as well as relying on the two argument vector's cross product to have a length of 1.0 (in order to skip the normalize when computing the rotation axis for the quaternion).

vec3 irr_glsl_slerp_impl_impl(in vec3 start, in vec3 preScaledWaypoint, float cosAngleFromStart)
{
    vec3 planeNormal = cross(start,preScaledWaypoint);
    
    cosAngleFromStart *= 0.5;
    const float sinAngle = sqrt(0.5-cosAngleFromStart);
    const float cosAngle = sqrt(0.5+cosAngleFromStart);
    
    planeNormal *= sinAngle;
    const vec3 precompPart = cross(planeNormal,start)*2.0;

    return start+precompPart*cosAngle+cross(planeNormal,precompPart);
}

Just to prove it works, scroll back to the top cover image of this post.

Alternatively plug it into one of Shader Toy examples with moving triangle lights, I've found my optimized implementation's solid angle calculation to be always within 0.5% of the non-optimized `acos` computation, but on average the error was much less indicating a precision of between 13 and 14 bits (which is most likely due to my use of `inversesqrt` and `sqrt`).

The performance overhead, equal sample for sample (not equal quality) is about 14% vs. Area Sampling in a full fledged renderer with MIS and multiple bounces. This naturally means that for an equal-time or equal-cost comparison we'd easily win with area sampling.

要查看或添加评论,请登录

Matt Kielan的更多文章

社区洞察

其他会员也浏览了