Wednesday, 1 March 2023

[OpenCV / OpenGL] Facial Albedo Approximation in Constrained Environments

This is a writeup of some of the work I did during my post graduate studies. The purpose of this research was to find solutions to the problem of facial reflectance capture in realtime on constrained hardware.Where facial reflectance is albedo & roughness textures for physically based BRDFs. 

Constrained hardware refers to the use of webcams or phone cameras, in addition to laptop hardware or low power mobile chips, with limited compute and graphics processing capability. 

Albedo Approximation

The first step was to extract albedo information from images. I used pre-existing renders for this process, with accurate normals rendered for each face.  
Test images were rendered in Blender using HDRIs for lighting.

Using the normal and color data, spherical harmonics can be extracted from the image. Lighting extraction can the be performed by using inverse rendering (Marschner, Guenter and Raghupathy, 2000).

Image = Diffuse * Albedo + Specular.

Rearranging the equation by dividing by the diffuse component results in an image that contains only the Albedo and specular. Spherical harmonics are used to approximate the diffuse component and have been shown to be up to 98% accurate for this task (Ramamoorthi, 2006)

 Image / Diffuse = Albedo + Specular.



The result of diffuse removal is shown above. The bright highlights on grazing angles can be removed by accounting for the fresnel affect manually. 

However, a more powerful technique is to combine highlight removal into a single step with a process called corrective fields. (Ichim et al., 2015)


Corrective fields remove a lot of the artifacts introduced by grazing angles and as a bonus help to reduce the specular component that is still present in the image. 

There are still two issues with the calculated result.  

  1. The image still contains specular information. 
  2. The image is limited to a single angle so the texture is stretched on the sides of faces, or missing if part of the face is occluded.

Multi-view Merging

The issues resulting from simple inverse rendering can both be solved by combining the results of multiple viewpoints. 
For the 2nd issue, multi-view merging means more areas of the face a visible to the system and thus areas are less likely to be missed.

An example of multiple viewpoints merged into a single texture.

The above image combines many angles to obtain the final result. Diffuse lighting has been removed from the image, however specular highlights are still clearly visible. 

Specular Removal

An interesting fact about specular highlights is that they are view dependent, whereas albedo is not. Using the reverse rendering via spherical harmonic technique as described above. A texture containing albedo + specular can be obtained. Because albedo is not affected by viewing angle, whereas specular is. Then any variation in luminosity can be attributed to a change in specular intesity.  

In theory, by choosing the minimum value of a point on the surface given multiple viewing angles, we can find the angle with the lowest specular response and use that value to get the most accurate estimation for the surfaces albedo. 

Choosing a minimum is technically correct, but in practice, limitations of the capture process and errors in the spherical harmonic estaimation, mean that niavely choosing the darkest pixel, often results in very visible seams in the image.

Seams appear when niavely sampling based on minimum pixel intensity.

Combining techniques

The trick to obtaining the best result and to removing the specular response is reversing the above process and merging images before attempting to remove lighting information from them. 
Whilst this makes little sense from a theoretical standpoint, it ultimately proves to result in better results that are free from seams and specular highlights. 

pre-merge extration (top) vs post merge extraction (bottom)

 

Future work

I obtained some very good results when changing the problem from one of reverse rendering to a linear regression problem. However it is difficult to optimise this kind of solution to low power devices, as a sufficiently powerful CPU is required.

Weighted least squares linear solve

 

 


References 

Ghosh, A. et al. (2011) ‘Multiview face capture using polarized spherical gradient illumination’, Proceedings of the 2011 SIGGRAPH Asia Conference on - SA ’11, 30(6), p. 1. doi: 10.1145/2024156.2024163.

Ramamoorthi, R. (2006) ‘Modeling Illumination Variation with Spherical Harmonics’, Face Processing: Advanced Modeling Methods, pp. 385–424.

IchiIchim, A. E., Bouaziz, S., & Pauly, M. (2015). Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics, 34(4), 45:1-45:14. https://doi.org/10.1145/2766974