DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
作者:互联网
Abstart
-
原文:Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction.These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF,a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data.
-
译文:计算机图形学,3D视觉和机器人协会已经提供了很多用来对3D图像进行渲染和重建的方法。这些方法很好的平衡了保真度、效率和压缩能力。在这篇文章中,我们会介绍DeepSDF,这是一种连续学习的SDF(标记距离函数),能够高质量,带差值和实现有噪声的3D数据输入的表示一类图形
-
原文:DeepSDF, like its classical counterpart, represents a shape’s surface by a continuous volumetric field: the magnitude of a point in the field represents the distance to the surface boundary and the sign indicates whether the region is inside (-) or outside (+) of the shape, hence our representation implicitly encodes a shape’s boundary as the zero-level-set of the learned function while explicitly representing the classification of space as being part of the shapes interior or not.
-
译文:DeepSDF,像他以往的经典的版本一样,使用连续的体积域去表示一个图形的表面:体积域中每一个体素点的大小是由该点和图形表面的边界决定的,每一个点的符号表示这个区域是在图形的内部还是在外部的。因此我们的表示能够将一个图形的边界隐式的编码为一个零水平集的学习函数,但是能够显示的表达空间的分类是否属于内部形状的一部分。
-
原文: While classical SDF’s both in analytical or discretized(离散化) voxel(体元,体素) form typically represent the surface of a single shape, DeepSDF can represent an entire class of shapes. Furthermore, we show stateof-the-art performance for learned 3D shape representation and completion while reducing the model size by an order of magnitude compared with previous work.
-
译文:而经典的SDF通常以分析体素或离散体素形式表示单个物体的表面,DeepSDF能够表示一完整的一类物体。进一步说,我们展示了学习3D模型的表示和完成的最先进的性能,与此同时,比起之前的工作,我们将模型的大小减少了一个工作量级。
introduction
- 原文:Deep convolutional networks which are a mainstay of image-based approaches grow quickly in space and time complexity when directly generalized to the 3rd spatial dimension, and more classical and compact surface representations such as triangle or quad meshes pose problems in training since we may need to deal with an unknown
number of vertices and arbitrary topology(任意的拓扑结构). These challenges have limited the quality, flexibility and fidelity of deep learning approaches when attempting to either input 3D data for processing or produce 3D inferences for object segmentation and reconstruction. - 译文:深度卷积网络,是基于图片方法的中流砥柱,当我们将深度卷积网络直接推广到第三空间维度的时候,其在时间和空间复杂度上会快速增长。更多经典和紧凑的平面表示,比如三角形网络或者是四网在训练中会出现各种问题,因为我们需要处理一个不知道具体数量的向量和任意的拓扑结构。当我们尝试着输入3D数据用来处理或者产生3D模型的推断和重建时,这些问题已经限制了深度学习方法保真度、灵活度和质量。
- 原文: In this work, we present a novel representation and approach for generative 3D modeling that is efficient, expressive, and fully continuous. Our approach uses the concept of a SDF, but unlike common surface reconstruction techniques which discretize this SDF into a regular grid for evaluation and measurement denoising, we instead learn a generative model to produce such a continuous field.
- 译文:在这个论文中,对于一个有效的,可表达的并且是完全连续的可生成的3D模型,我们展示了一种新的表示和方法。我们的方法是借用了SDF的理论,但是不同于常见的表面重建技术,他们将SDF离散化为一个常见的网格,这主要是用来估算和测量去噪,但是我们不一样,我们通过学习生成了一个可生成模型去产生这样一个连续的模型域。
- 原文:The proposed continuous representation may be intuitively understood as a learned shape conditioned classifier for which the decision boundary is the surface of the shape itself, as shown in Fig. 2. Our approach shares the generative aspect of other works seeking to map a latent space to a distribution of complex shapes in 3D [54], but critically differs in the central representation. While the notion of an implicit surface defined as a SDF is widely known in the computer vision and graphics communities, to our knowledge no prior works have attempted to directly learn continuous, generalizable 3D generative models of SDFs.
- 译文:我们之前提出来的连续的表达方式可能会被直观的理解为一个学习生成的以形状为条件的分类器,它的决策边界是图形本身的表面,正如图片2展示的一样。我们的方法借用了那些试图在3D模型中将潜在空间映射到复杂形状分布的方法的生成方面,但是严格的说,我们在中心的表达是不同的。虽然在计算机视觉和图形学社区中,不可见面的概念被定义为SDF。据我们所知,之前没有任何的作品试图直接学习SDFs的连续的,可泛化的的3D生成模型。
- 原文:Our contributions include: (i) the formulation of generative shape-conditioned 3D modeling with a continuous implicit surface, (ii) a learning method for 3D shapes based on a probabilistic auto-decoder, and (iii) the demonstration and application of this formulation to shape modeling and completion. Our models produce high quality continuous surfaces with complex topologies, and obtain state-of-theart results in quantitative comparisons for shape reconstruction and completion. As an example of the effectiveness of our method, our models use only 7.4 MB (megabytes) of memory to represent entire classes of shapes (for example, thousands of 3D chair models) – this is, for example, less than half the memory footprint (16.8 MB) of a single uncompressed 5123 3D bitmap.
- 译文:我们的贡献主要包括:1、具有连续隐式曲面的可生成的形状条件的三维建模共识。2、基于概率自动解码针对3D模型的学习方法。3、这个公式在模型重塑和完成中的阐释和应用。我们的架构模型能够产生具有复杂拓扑结构的高质量的连续曲面。并且对比大量的其他的模型重建和完成方法,我们能够获得最好的结果。这里举一个样例,我们只是用7.4MB就能够表示完整的一类模型(比如说成千上万的凳子)。对不起一个没有压缩3D位图大约有16.8个字节,我们的方法要小于他的一半。
Related Work
- 原文:We review three main areas of related work: 3D representations for shape learning (Sec. 2.1), techniques for learning generative models (Sec. 2.2), and shape completion (Sec. 2.3).
- 译文:我们主要回顾三部分的工作,用于形体学习的3D模型的表示,学习生成模型的技术和模型的完成
Representations for 3D Shape Learning
- 原文:Representations for data-driven 3D learning approaches can be largely classified into three categories: point-based, mesh-based, and voxel-based methods. While some applications such as 3D-point-cloud-based object classification are well suited to these representations, we address their limitations in expressing continuous surfaces with complex topologies.
- 译文:对于数据驱动3D学习方法表示可以大概的分为三个类别:基于点,基于mesh,和基于体素点三种方法。虽然一些应用比如说基于3D点云对象的分类器都能很好的适应于这些表达,我们会列出这些方法在处理复杂拓扑结构的连续表面的限制和弊端。
Point-Based
- 原文: A point cloud is a lightweight 3D representation that closely matches the raw data that many sensors (i.e. LiDARs, depth cameras) provide, and hence is a natural fit for applying 3D learning. PointNet [38, 39], for example,uses max-pool operations to extract global shape features, and the technique is widely used as an encoder for point generation networks [57, 1]. There is a sizable list of related works to the PointNet style approach of learning on point clouds. A primary limitation, however, of learning with point clouds is that they do not describe topology and are not suitable for producing watertight surfaces.
- 译文:点云是轻量级的3D表示,他与很多传感器(雷达,深度相机)提供的原始的数据相匹配,所以其本身就是适应3D学习。比如说,点云网络PointNet,使用最大池化操作来提取整体形体特征,并且技术并已经被广泛被用做一个针对点云生成网络的编码器。关于点云上的点网式学习方法的相关著作有一个相当大的列表。然而,使用点云学习的一个主要限制是,它们不能描述拓扑结构,也不适合产生水密表面。
Mesh-based
- 原文:Various approaches represent classes of similarly shaped objects, such as morphable human body parts, with predefined template meshes and some of these models demonstrate high fidelity shape generation results [2, 34]. Other recent works [3] use poly-cube mapping [51] for shape optimization. While the use of template meshes is convenient and naturally provides 3D correspondences, it can only model shapes with fixed mesh topology.
- 译文:各种不同的方法表示形体相似对象类,比如可变人体部位,使用预定义的模板网格和其中一些模型,展示了高保真形状生成结果。其他最近的作品,使用多边形立方体映射形状优化。而模板网格的使用是方便自然地提供3D通信,它只能建模具有固定网格拓扑的形状。
Voxel-based
- 原文:Voxels, which non-parametrically describe volumes with 3D grids of values, are perhaps the most natural extension into the 3D domain of the well-known learning paradigms (i.e., convolution) that have excelled in the 2D image domain. The most straightforward variant of voxelbased learning is to use a dense occupancy grid (occupied /not occupied). Due to the cubically growing compute and memory requirements, however, current methods are only able to handle low resolutions (1283 or below). As such, voxel-based approaches do not preserve fine shape details [56, 14], and additionally voxels visually appear significantly different than high-fidelity shapes, since when rendered their normals are not smooth. Octree based methods [52, 43, 26] alleviate the compute and memory limitations of dense voxel methods, extending for example the ability to learn at up to 5123 resolution [52], but even this resolution is far from producing shapes that are visually compelling.
- 译文:体素,它能够给非参数地描述体积与三维网格的值,它可能是最自然的扩展到3D领域的著名学习范式(即卷积),远远胜于在2D图像领域的效果。基于体素的学习最直接的变体是使用密集的占用网格(已占用/未占用)。然而,由于立方增长的计算和内存需求,目前的方法只能处理低分辨率(1283或以下)。因此,基于体素的方法不能保留精细的形状细节[56,14],此外,体素在视觉上与高保真形状明显不同,因为当渲染它们的法线时,它们的法线并不光滑。基于八元树的方法[52,43,26]缓解了密集体素方法的计算和内存限制,例如扩展了高达5123分辨率[52]的学习能力,但即使是这种分辨率也远不能产生视觉上引人注目的形状。
- 原文:Aside from occupancy grids, and more closely related to our approach, it is also possible to use a 3D grid of voxels to represent a signed distance function. This inherits from the success of fusion approaches that utilize a truncated SDF (TSDF), pioneered in [15, 37], to combine noisy depth maps into a single 3D model. Voxel-based SDF representations have been extensively used for 3D shape learning [59, 16, 49], but their use of discrete voxels is expensive in memory. As a result, the learned discrete SDF approaches generally present low resolution shapes. [30] reports various wavelet transform-based approaches for distance field compression, while [10] applies dimensionality reduction techniques on discrete TSDF volumes. These methods encode the SDF volume of each individual scene rather than a dataset of shapes.
- 译文:除了占用网格,更接近我们的方法,也可以使用三维体素网格来表示符号距离函数。这继承了在[15,37]中首创的利用截断SDF (TSDF)融合方法的成功,该方法将噪声深度映射合并到单个3D模型中。基于体素的SDF表示已被广泛用于3D形状学习[59,16,49],但它们对离散体素的使用在内存中是昂贵的。因此,学习到的离散SDF方法通常呈现低分辨率形状。[30]报告了各种基于小波变换的距离场压缩方法,而[10]在离散TSDF体积上应用了降维技术。这些方法对每个场景的SDF体积进行编码,而不是对形状数据集进行编码。
Representation Learning Techniques
- 原文:Modern representation learning techniques aim at automatically discovering a set of features that compactly but expressively describe data. For a more extensive review of the field, we refer to Bengio et al. [4].
- Generative Adversial Networks. GANs [21] and their variants [13, 41] learn deep embeddings of target data by training discriminators adversarially against generators.Applications of this class of networks [29, 31] generate realstic images of humans, objects, or scenes. On the downside, adversarial training for GANs is known to be unstable. In the 3D domain, Wu et al. [54] trains a GAN to generate objects in a voxel representation, while the recent work of Hamu et al. [23] uses multiple parameterization planes to generate shapes of topological spheres.
- 译文:现代的表示学习技术旨在自动发现一组简洁而富有表现力地描述数据的特征。为了对该领域进行更广泛的回顾,我们参考Bengio等人的[4]。
- 生成Adversial网络。GANs[21]及其变体[13,41]通过训练对抗生成器的判别器来学习深度嵌入的目标数据。这类网络的应用[29,31]产生真实的人、物体或场景的图像。不利的一面是,众所周知GANS的对抗训练并不稳定。在3D领域,Wu等人的[54]训练GAN以体素表示生成对象,而Hamu等人的[23]最近的工作使用多个参数化平面生成拓扑球体的形状。
观点:GANS对抗训练不稳定,GAN通过训练对抗生成器的判别器来学习深度嵌入的目标数据
- 原文:Auto-encoders. Auto-encoder outputs are expected to replicate the original input given the constraint of an information bottleneck between the encoder and decoder.The ability of auto-encoders as a feature learning tool has been evidenced by the vast variety of 3D shape learning works in the literature [16, 49, 2, 22, 55] who adopt auto-encoders for representation learning. Recent 3D vision works [6, 2, 34] often adopt a variational auto-encoder (VAE) learning scheme, in which bottleneck features are perturbed with Gaussian noise to encourage smooth and complete latent spaces. The regularization on the latent vectors enables exploring the embedding space with gradient descent or random sampling.
- 译文:在编码器和解码器之间的信息瓶颈约束下,期望自动编码器输出能复制原始输入。自动编码器作为一种特征学习工具的能力已经被学术界中大量采用自动编码器进行表示学习的3D形状学习著作所证明。最近的3D视觉作品[6,2,34]通常采用变分自编码器(VAE)学习方案,该学习方案用高斯噪声扰动瓶颈特征,以促进平滑和完全的潜在空间。通过对潜在向量的正则化,可以利用梯度下降或随机抽样来探索嵌入空间。
观点:变分自编码器,使用高斯干扰突破瓶颈,促进平滑和完全的特征空间。潜在向量的正则化使得能够进行梯度下降和随机抽样,在探索内嵌空间。
- 原文:Optimizing Latent Vectors. Instead of using the full auto-encoder for representation learning, an alternative is to learn compact data representations by training decoderonly networks. This idea goes back to at least the work of Tan et al. [50] which simultaneously optimizes the latent vectors assigned to each data point and the decoder weights through back-propagation. For inference, an optimal latent vector is searched to match the new observation with fixed decoder parameters. Similar approaches have been extensively studied in [42, 8, 40], for applications including noise reduction, missing measurement completions, and fault detections. Recent approaches [7, 20] extend the technique by applying deep architectures. Throughout the paper we refer to this class of networks as auto-decoders, for they are trained with self-reconstruction loss on decoder-only architectures.
- 译文:**最优化特征向量:**与使用完整的自动编码器进行表示学习不同,另一种选择是通过训练解码网络来学习紧凑的数据表示。这个想法至少可以追溯到Tan等人的工作[50],他们通过反向传播同时优化分配给每个数据点的特征向量和解码器的权值。对于推理,在解码器参数固定的情况下,寻找最优的特征向量来匹配新的观测结果。类似的方法在[42,8,40]中得到了广泛的研究,用于降噪、漏测完井和故障检测等应用。最近的方法[7,20]通过应用深层架构扩展了该技术。在整篇论文中,我们将这类网络称为自动解码器,因为它们是在仅解码器架构上用自重构损失进行训练的。
观点:训练解码网络使用反向传播网络优化特征向量和每个点的权值。自动解码器就是在仅解码器架构上用自重构损失进行训练的。
Shape Completion
- 原文:3D shape completion related works aim to infer unseen parts of the original shape given sparse or partial input observations. This task is anaologous to image-inpainting in 2D computer vision. Classical surface reconstruction methods complete a point cloud into a dense surface by fitting radial basis function (RBF) [11] to approximate implicit surface functions, or by casting the reconstruction from oriented point clouds as a Poisson problem [32]. These methods only model a single shape rather than a dataset. Various recent methods use data-driven approaches for the 3D completion task. Most of these methods adopt encoder-decoder architectures to reduce partial inputs of occupancy voxels [56], discrete SDF voxels [16], depth maps [44], RGB images [14, 55] or point clouds [49] into a latent vector and subsequently generate a prediction of full volumetric shape based on learned priors.
- 译文:三维塑形相关工作的目的是在给定稀疏或部分输入观测数据的情况下,推断出原始形状中不可见的部分。这个任务类似于二维计算机视觉中的图像修复。经典的曲面重构方法是通过拟合径向基函数(RBF)来近似隐式曲面函数,或将有向点云的重构转化为泊松问题[32],从而将点云重构成稠密曲面。这些方法只对单个形状建模,而不是对数据集建模。最近的各种方法都使用了数据驱动的方法来完成3D建模。这些方法大多采用编码器-解码器架构来减少占用体素[56]、离散SDF体素[16]的部分输入,深度映射[44]、RGB图像[14,55]或点云[49]为潜在向量,然后根据学习的先验生成全体积形状的预测。
观点:三维塑形难点在于如何在稀疏或部分观测数据的情况下,推断出原始形状中不可见的部分。
疑问点:径向基函数,这里涉及到拟合函数,拟合径向基函数。如何将有向点云问题重构为泊松问题。
重点:这篇文章提到了针对单个形状的的建模方法,使用函数拟合。
Modeling SDFs with Neural Networks
-
原文:In this section we present DeepSDF, our continuous shape learning approach. We describe modeling shapes as the zero iso-surface decision boundaries of feed-forward networks trained to represent SDFs. A signed distance function is a continuous function that, for a given spatial point, outputs the point’s distance to the closest surface, whose sign encodes whether the point is inside (negative) or outside (positive) of the watertight surface:
The underlying surface is implicitly represented by the isosurface of SDF(·) = 0. A view of this implicit surface can be rendered through raycasting or rasterization of a mesh obtained with, for example, Marching Cubes [35]. -
译文:在本节中,我们将介绍DeepSDF,我们的连续模型学习方法。我们将模型的形状描述为训练用来表示SDFs的前馈网络的零等值面决策边界。有符号距离函数是一个连续函数,对于给定的空间点,输出点到最近表面的距离,其符号表示点是在水密表面内部(负)还是外部(正):(公式详见上方英文)
下垫面隐式表示为SDF(·)= 0的等值面。这个隐式表面的视图可以通过光线投射或栅格化得到的网格,例如,移动立方体[35]。
观点:将模型的形状描述为被训练用来表示SDFs的前反馈网络的零等值决策边界。
问题:什么是零等值面的前反馈式网络
-
原文:Our key idea is to directly regress the continuous SDF from point samples using deep neural networks. The resulting trained network is able to predict the SDF value of a given query position, from which we can extract the zero level-set surface by evaluating spatial samples. Such surface representation can be intuitively understood as a learned binary classifier for which the decision boundary is the surface of the shape itself as depicted in Fig. 2. As a universal function approximator [27], deep feed-forward networks in theory can learn the fully continuous shape functions with arbitrary precision. Yet, the precision of the approximation in practice is limited by the finite number of point samples that guide the decision boundaries and the finite capacity of the network due to restricted compute power.
-
译文:我们的核心思想是利用深度神经网络直接从点样本回归连续的SDF。训练得到的网络能够预测给定查询位置的SDF值,从中我们可以通过评估空间样本来提取零水平集曲面。这样的表面表示法可以直观地理解为一个学习到的二元分类器,其决策边界是形状本身的表面,如图2所示。深度前馈网络作为一种通用函数逼近器[27],理论上可以学习任意精度的全连续形状函数。然而,在实践中,由于指导决策边界的点样本数量有限,以及由于有限的计算能力,网络容量有限,逼近的精度受到限制。
观点:核心思想是利用深度神经网络直接从点样本回归连续的SDF。该神经网络能预测给定位置的SDF值,从而能够获取到零水平集曲面。但是实际中由于各种影响,存在一定的误差。
问题:深度前馈网络作为通用的函数逼近器,为什么?
-
原文:The most direct application of this approach is to train a single deep network for a given target shape as depicted in Fig. 3a. Given a target shape, we prepare a set of pairs X composed of 3D point samples and their SDF values:
We train the parameters θ of a multi-layer fully-connected neural network fθ on the training set S to make fθ a good approximator of the given SDF in the target domain Ω:
The training is done by minimizing the sum over losses between the predicted and real SDF values of points in X under the following L1 loss function:
where clamp(x, δ) := min(δ, max(−δ, x)) introduces the parameter δ to control the distance from the surface over which we expect to maintain a metric SDF. Larger values of δ allow for fast ray-tracing since each sample gives information of safe step sizes. Smaller values of δ can be used to concentrate network capacity on details near the surface. -
译文:
- 该方法最直接的应用是针对图3a所示的给定目标形状训练单个深度网络。给定目标形状,我们准备一组由三维点样本及其SDF值组成的对X:
- 我们在训练集S上训练多层全连接神经网络fθ的参数θ,使fθ成为目标域中给定SDF的良好近似器Ω:
- 在以下L1损失函数下,通过最小化X点的预测SDF值和真实SDF值之间的损失之和来完成训练:
- 其中夹具(x, δ):= min(δ, max(−δ, x))引入参数δ来控制我们期望保持的度量制SDF到表面的距离。较大的δ值允许快速射线追踪,因为每个样品提供安全步长信息。较小的δ值可以使网络容量集中在靠近地表的细节上。
- 该方法最直接的应用是针对图3a所示的给定目标形状训练单个深度网络。给定目标形状,我们准备一组由三维点样本及其SDF值组成的对X:
观点:将模型使用3D的坐标和SDFs值进行表示,使用全连接的神经网络去训练特定的目标集。并且通过最小化生成结果和实际结果之间的溢出误差,来调整网络。
疑点:关于夹具clamp的作用不是很清楚,包括如何进行射线追踪和安全步长信息的计算。但是这个值很重要,能够将
- 原文:To generate the 3D model shown in Fig. 3a, we use δ = 0.1 and a feed-forward network composed of eight fully connected layers, each of them applied with dropouts.All internal layers are 512-dimensional and have ReLU non-linearities. The output non-linearity regressing the SDF value is tanh. We found training with batch-normalization [28] to be unstable and applied the weight-normalization technique instead [46]. For training, we use the Adam optimizer [33]. Once trained, the surface is implicitly represented as the zero iso-surface of fθ(x), which can be visualized through raycasting or marching cubes. Another nice property of this approach is that accurate normals can be analytically computed by calculating the spatial derivative ∂fθ(x)/∂x via back-propogation through the network.
- 译文:为了生成图3a所示的3D模型,我们使用δ = 0.1和一个由8个完全连接的层组成的前馈网络,每个层都应用了dropout。所有内部层都是512维的,具有ReLU非线性。输出非线性回归的SDF值为tanh。我们发现使用批处理归一化[28]的训练是不稳定的,并应用权重归一化技术代替[46]。对于训练,我们使用亚当优化器[33]。一旦训练,表面隐式表示为fθ(x)的零等面,这可以通过光线投射或移动立方体可视化。这种方法的另一个很好的特性是,通过通过网络的反向传播计算∂fθ(x)/∂x的空间导数,可以解析地计算出精确的法线。
观点:使用权重归一化技术代替批处理归一化,使得网络更加稳定。主要是讲了对于单个形状DeepSDF的改进。
疑点:几个概念不是很理解:亚当优化器,权重归一化技术,ReLU非线性,光线投射和立方体可视化。
Learning the Latent Space of Shapes
- 原文:Training a specific neural network for each shape is neither feasible nor very useful. Instead, we want a model that can represent a wide variety of shapes, discover their common properties, and embed them in a low dimensional latent space. To this end, we introduce a latent vector z, which can be thought of as encoding the desired shape, as a second input to the neural network as depicted in Fig. 3b. Conceptually, we map this latent vector to a 3D shape represented by a continuous SDF. Formally, for some shape indexed by i, fθ is now a function of a latent code zi and a query 3D location x, and outputs the shape’s approximate SDF:
By conditioning the network output on a latent vector, this formulation allows modeling multiple SDFs with a single neural network. Given the decoding model fθ, the continuous surface associated with a latent vector z is similarly represented with the decision boundary of fθ(z, x), and the shape can again be discretized for visualization by, for example, raycasting or Marching Cubes. Next, we motivate the use of encoder-less training before introducing the ‘auto-decoder’ formulation of the shapecoded DeepSDF. - 译文:为每种模型训练一个特定的神经网络既不可行,也不是很有用。相反,我们想要一个模型,它可以代表各种各样的形状,发现它们的共同属性,并将它们嵌入低维的特征空间。为此,我们引入了一个潜在向量z,它可以被认为是编码所需形状,作为如图3b所示的神经网络的第二个输入。概念上,我们将这个潜在向量映射到一个由连续的SDF表示的三维形状。形式上,对于i索引的形状,fθ现在是一个潜在代码zi和查询3D位置x的函数,并输出形状的近似SDF:
通过在一个潜在向量上调节网络输出,这个公式允许用一个神经网络建模多个sdf。给定解码模型fθ,与潜在向量z相关的连续曲面同样可以用fθ的决策边界(z, x)表示,形状可以再次通过光线投射或移动立方体进行离散化以实现可视化。接下来,我们在引入模型编码的DeepSDF的“自动解码器”公式之前,激励使用无编码器训练。
观点:为了使得这个模型能够针对不同的形状,我们引入了编码向量,来映射目标形状。使得一种网络可以针对不同形状的模型。
Motivating Encoder-less Learning
-
原文:Auto-encoders and encoder-decoder networks are widely used for representation learning as their bottleneck features tend to form natural latent variable representations. Recently, in applications such as modeling depth maps [6], faces [2], and body shapes [34] a full auto-encoder is trained but only the decoder is retained for inference, where they search for an optimal latent vector given some input observation. However, since the trained encoder is unused at test time, it is unclear whether using the encoder is the most effective use of computational resources during training. This motivates us to use an auto-decoder for learning a shape embedding without an encoder as depicted in Fig. 4. We show that applying an auto-decoder to learn continuous SDFs leads to high quality 3D generative models. Further, we develop a probabilistic formulation for training and testing the auto-decoder that naturally introduces latent space regularization for improved generalization. To the best of our knowledge, this work is the first to introduce the auto-decoder learning method to the 3D learning community.
-
译文:自编码器和编解码器网络广泛用于表示学习,因为它们的瓶颈特征往往在于能形成自然的特征变量表示。最近,在诸如建模深度映射,面和身体形状等应用中,训练了一个完整的自动编码器,但只有解码器被保留用于推理,在那里他们搜索一个给定输入观测的最优潜在向量。然而,由于训练过的编码器在测试时没有使用,所以不清楚在训练期间使用编码器是否是对计算资源最有效的使用。这促使我们使用自动解码器来学习形状嵌入,无需编码器,如图4所示。我们证明,应用自动解码器学习连续的SDFs可以产生高质量的3D生成模型。进一步,我们开发了一个用于训练和测试自动解码器的概率公式,它自然地引入了潜在空间正则化来改进泛化。据我们所知,这项工作是第一次将自动解码器学习方法引入3D学习社区。
观点:用自动解码器学习连续的SDFs可以产生高质量的3D生成模型
Auto-decoder-based DeepSDF Formulation
-
原文:To derive the auto-decoder-based shape-coded DeepSDF formulation we adopt a probabilistic perspective. Given a dataset of N shapes represented with signed distance function SDFiN i=1, we prepare a set of K point samples and their signed distance values:
-
译文:为了推导基于自动解码器的形状编码DeepSDF公式,我们采用了概率的观点。给定N个形状的数据集,用有符号距离函数SDFiN i=1表示,我们准备了一组K个点样本及其有符号距离值:
-
原文:For an auto-decoder, as there is no encoder, each latent code zi is paired with training shape Xi . The posterior over shape code zi given the shape SDF samples Xi can be decomposed as:
-
译文:对于自动解码器,由于没有编码器,每个潜在代码zi与训练形状Xi配对。给定形状SDF样本Xi,后验上方形状码zi可分解为:
-
原文:where θ parameterizes the SDF likelihood. In the latent shape-code space, we assume the prior distribution over codes p(zi) to be a zero-mean multivariate-Gaussian with a spherical covariance σ 2I. This prior encapsulates the notion that the shape codes should be concentrated and we empirically found it was needed to infer a compact shape manifold and to help converge to good solutions.
-
译文:θ是SDF可能性的参数。在隐形码空间中,我们假定码p(zi)上的先验分布是一个零均值多元高斯分布,具有球形协方差σ 2I。这一先验概括了形状代码应该集中的概念,我们根据经验发现,它需要推断一个紧凑的形状流形,并帮助收敛到好的解决方案。
-
原文:In the auto-decoder-based DeepSDF formulation we express the SDF likelihood via a deep feed-forward network fθ(zi , xj ) and, without loss of generality, assume that the likelihood takes the form:
-
译文:在基于自动解码器的DeepSDF公式中,我们通过深度前馈网络fθ(zi, xj)表示SDF的似然性,在不丧失一般性的情况下,假设似然为:
-
原文:The SDF prediction s˜j = fθ(zi, xj ) is represented using a fully-connected network. L(˜sj , sj ) is a loss function penalizing the deviation of the network prediction from the actual SDF value sj . One example for the cost function is the standard L2 loss function which amounts to assuming Gaussian noise on the SDF values. In practice we use the clamped L1 cost from Eq. 4 for reasons outlined previously.
-
译文:SDF预测s标识符j = fθ(zi, xj)采用全连接网络表示。属于sj, sj)是一个损失函数,用于惩罚网络预测与实际SDF值sj的偏差。代价函数的一个例子是标准的L2损失函数,它相当于在SDF值上假设高斯噪声。在实践中,由于前面概述的原因,我们使用公式4中的固定L1成本。
-
原文:Crucially, this formulation is valid for SDF samples X of arbitrary size and distribution because the gradient of the loss with respect to z can be computed separately for each SDF sample. This implies that DeepSDF can handle any form of partial observations such as depth maps. This is a major advantage over the auto-encoder framework whose encoder expects a test input similar to the training data, e.g. shape completion networks of [16, 58] require preparing training data of partial shapes.
-
译文:至关重要的是,这个公式对于任意大小和分布的SDF样本X是有效的,因为对于每个SDF样本,损失相对于z的梯度可以单独计算。这意味着DeepSDF可以处理任何形式的局部观测,比如深度图。与自动编码器框架相比,这是一个主要优势,自动编码器框架的编码器需要类似于训练数据的测试输入,例如[16,58]的形状完成网络需要准备部分形状的训练数据。
-
原文:To incorporate the latent shape code, we stack the code vector and the sample location as depicted in Fig. 3b and feed it into the same fully-connected NN described previously at the input layer and additionally at the 4th layer. We again use the Adam optimizer [33]. The latent vector z is initialized randomly from N (0, 0.012 ).
* Note that while both VAE and the proposed auto-decoder formulation share the zero-mean Gaussian prior on the latent codes, we found that the the stochastic nature of the VAE optimization did not lead to good training results. -
译文:为了整合潜在形状代码,我们将如图3b所示的代码向量和样本位置叠加在一起,并在输入层和第4层分别将其输入到前面描述的相同的全连接NN中。我们再次使用亚当优化器[33]。潜向量z从N(0, 0.012)随机初始化。
* 值得注意的是,虽然VAE和本文提出的自解码器公式在潜码上都有零均值高斯先验,但我们发现VAE优化的随机性并没有导致良好的训练结果。
Data Preparation
- 原文:To train our continuous SDF model, we prepare the SDF samples X (Eq. 2) for each mesh, which consists of 3D points and their SDF values. While SDF can be computed through a distance transform for any watertight shapes from real or synthetic data, we train with synthetic objects, (e.g. ShapeNet [12]), for which we are provided complete 3D shape meshes. To prepare data, we start by normalizing each mesh to a unit sphere and sampling 500,000 spatial points x’s: we sample more aggressively near the surface of the object as we want to capture a more detailed SDF near the surface. For an ideal oriented watertight mesh, computing the signed distance value of x would only involve finding the closest triangle, but we find that human designed meshes are commonly not watertight and contain undesired internal structures.
- 译文:为了训练我们连续的SDF模型,我们为每个网格准备了SDF样本X (Eq. 2),它由三维点及其SDF值组成。虽然SDF可以通过对真实或合成数据来计算任何水密形状的距离变换,但我们使用合成对象(例如ShapeNet[12])进行训练,为此我们提供了完整的3D形状网格。为了准备数据,我们首先将每个网格归一化为一个单位球体,并采样500,000个空间点x:我们在物体表面附近更积极地采样,因为我们想在表面附近捕获更详细的SDF。对于一个理想的定向水密网格,计算x的符号距离值只涉及到寻找最近的三角形,但是我们发现人类设计的网格通常不是水密的,并且包含不需要的内部结构。
问题:理想的定向水密网格是什么意思?
- 原文: To obtain the shell of a mesh with proper orientation, we set up equally spaced virtual cameras around the object, and densely sample surface points, denoted Ps, with surface normals oriented towards the camera. Double sided triangles visible from both orientations (indicating that the shape is not closed) cause problems in this case, so we discard mesh objects containing too many of such faces. Then, for each x, we find the closest point in Ps, from which the SDF(x) can be computed. We refer readers to supplementary material for further details.
- 译文:为了获得具有适当方向的网格外壳,我们在物体周围设置了等间距的虚拟摄像机,并密集地采样表面点,表示为Ps,表面法线朝向摄像机。在这种情况下,从两个方向都可见的双面三角形(表明形状不是闭合的)会造成问题,所以我们丢弃包含太多这样面的网格对象。然后,对于每个x,我们找到Ps中最近的点,从中可以计算出SDF(x)。我们建议读者查阅补充资料以了解更多细节。
两个方向都可见的双面三角形造成的什么问题?
Results
- 原文:We conduct a number of experiments to show the representational power of DeepSDF, both in terms of its ability to describe geometric details and its generalization capability to learn a desirable shape embedding space. Largely, we propose four main experiments designed to test its ability to
-
- represent training data,
-
- use learned feature representation to reconstruct unseen shapes,
-
- apply shape priors to complete partial shapes, and
-
- learn smooth and complete shape embedding space from which we can sample new shapes. For all experiments we use the popular ShapeNet dataset.
-
- 译文:我们进行了一系列实验来展示DeepSDF的表现能力,包括它描述几何细节的能力和它学习理想形状嵌入空间的泛化能力。在很大程度上,我们提出了四个主要的实验来测试它的能力
- 1)表示训练数据,
- 2)使用学习特征表示来重建看不见的形状,
- 3)应用形状先验完成局部形状
- 4)学习内部模型的平滑和完整的形状,从中我们可以采样新的形状。对于所有的实验,我们使用流行的ShapeNet数据集。
观点:通过训练数据、重构不可见的特征、应用模型先验去完成部分特征和对于模型内部的形状建立
- 原文:We select a representative set of 3D learning approaches to comparatively evaluate aforementioned criteria: a recent octree-based method (OGN) [52], a mesh-based method (AtlasNet) [22], and a volumetric SDF-based shape completion method (3D-EPN) [16] (Table 1). These works show state-of-the-art performance in their respective representations and tasks, so we omit comparisons with the works that have already been compared: e.g. OGN’s octree model outperforms regular voxel approaches, while AtlasNet compares itself with various points, mesh, or voxel based methods and 3D-EPN with various completion methods.
- 译文:我们选择一组代表性的3 d学习方法相对评价上述标准:最近octree-based方法(OGN)[52]、 基于网格的方法(AtlasNet)[22], 和一个体积SDF-based形状填充法(3 d-epn)16。这些作品展示最先进的性能在各自表征和任务,因此我们忽略与已经比较过的作品相比:例如,OGN的八叉树模型优于常规的体素方法,而AtlasNet与各种基于点、网格或体素的方法进行比较,3D-EPN与各种补全方法进行比较。
观点:和3D-EPN、OGN、AtlasNet-Sphere和DeepSDF进行比较
6.1. Representing Known 3D Shapes
- 原文:First, we evaluate the capacity of the model to represent known shapes, i.e. shapes that were in the training set, from only a restricted-size latent code — testing the limit of expressive capability of the representations.Quantitative comparison in Table 2 shows that the proposed DeepSDF significantly beats OGN and AtlasNet in Chamfer distance against the true shape computed with a large number of points (30,000). The difference in earth mover distance (EMD) is smaller because 500 points do not well capture the additional precision. Figure 5 shows a qualitative comparison of DeepSDF to OGN.
- 译文:首先,我们评估模型表示已知形状(即训练集中的形状)的能力,仅从限制大小的特征代码——测试表示能力的限制。表2中的定量比较显示,提出的DeepSDF在倒角距离上明显优于OGN和AtlasNet,而用大量点(30,000)计算出的真实形状。地面移动距离(EMD)的差异较小,因为500个点不能很好地捕捉额外的精度。图5显示了DeepSDF与OGN的定性比较。
观点:评估模型表示已知形状的能力,在点足够多的情况下,DeepSDF已经远远胜于同类的几种方法。主要是数据集中已经有的形状,根据SDF和点的坐标值,是可以很好的进行重建的。
6.2Representing Test 3D Shapes(auto-encoding)
-
原文:For encoding unknown shapes, i.e. shapes in the held-out test set, DeepSDF again significantly outperforms AtlasNet on a wide variety of shape classes and metrics as shown in Table 3. Note that AtlasNet performs reasonably well at classes of shapes that have mostly consistent topology without holes (like planes) but struggles more on classes that commonly have holes, like chairs. This is shown in Fig. 6 where AtlasNet fails to represent the fine detail of the back of the chair. Figure 7 shows more examples of detailed reconstructions on test data from DeepSDF as well as two example failure cases.
-
译文:对于编码未知形状(即在递出测试集中的形状),DeepSDF在表3所示的各种形状类和指标上再次显著优于AtlasNet。注意,AtlasNet在具有基本一致拓扑且没有洞的形状类(如平面)上表现得相当好,但在通常有洞的形状类(如椅子)上表现得比较差。如图6所示,AtlasNet不能代表椅子靠背的细节。图7显示了从DeepSDF对测试数据进行详细重构的更多示例,以及两个示例失败案例。
观点:DeepSDF对于有洞的模型也能很好的进行重建,AtlasNet对于没有洞的模型的重建效果是远远好于DeepSDF的。
Shape Completion
-
原文:A major advantage of the proposed DeepSDF approach for representation learning is that inference can be performed from an arbitrary number of SDF samples. In the DeepSDF framework, shape completion amounts to solving for the shape code that best explains a partial shape observation via Eq. 10. Given the shape-code a complete shape can be rendered using the priors encoded in the decoder.
-
译文:提出的DeepSDF方法用于表示学习的一个主要优点是,可以从任意数量的SDF样本进行推理。在DeepSDF框架中,形状补全相当于解决形状编码码,形状编码最好地解释了部分形状观察。给定形状码,可以使用译码器中编码的先验来呈现完整的形状。
观点:使用形状编码来表现物体的形状,实现模型的补全。DeepSDF的最大的优点就是可以从任意数量的样本进行推理。
几乎没看懂
- 原文:We test our completion scheme using single view depth observations which is a common use-case and maps well to our architecture without modification. Note that we currently require the depth observations in the canonical shape frame of reference.
- To generate SDF point samples from the depth image observation, we sample two points for each depth observation, each of them located η distance away from the measured surface point (along surface normal estimate). With small η we approximate the signed distance value of those points to be η and −η, respectively. We solve for Eq. 10 with loss function of Eq. 4 using clamp value of η. Additionally, we incorporate free-space observations, (i.e. empty-space between surface and camera), by sampling points along the freespace-direction and enforce larger-than-zero constraints.The freespace loss is |fθ(z, xj )| if fθ(z, xj ) < 0 and 0 otherwise
- 译文:我们使用单视图深度观察来测试我们的完成方案,这是一个常见的用例,可以很好地映射到我们的架构中,无需修改。请注意,我们目前需要在标准形状参照系中进行深度观测。
- 为了从深度图像观测中生成SDF点样本,我们为每个深度观测采样两个点,每个点都位于离被测表面点的η距离(沿表面法向估计)。在小的η值下,我们将这些点的有符号距离值分别近似为η和−η。用钳位η值求解10式,损失函数为4式。此外,我们通过沿自由空间方向的采样点合并自由空间观测(即表面和相机之间的空空间),并强制执行大于零的约束。如果fθ(z, xj) <,则自由空间损失为|fθ(z, xj)|;0和0否则
观点:使用单深度视图观察测试完成方案
-
原文:Given the SDF point samples and empty space points, we similarly optimize the latent vector using MAP estimation. Tab. 4 and Figs. (22, 9) respectively shows quantitative and qualitative shape completion results. Compared to one of the most recent completion approaches [16] using volumetric shape representation, our continuous SDF approach produces more visually pleasing and accurate shape reconstructions. While a few recent shape completion methods were presented [24, 55], we could not find the code to run the comparisons, and their underlying 3D representation is voxel grid which we extensively compare against.
-
译文:在给定SDF点样本和空空间点的情况下,我们同样使用映射估计来优化潜在向量。表4及图。(22,9)分别为定量和定性形状完井结果。与最新的一种采用体积形状表示的完井方法[16]相比,我们的连续SDF方法可以产生更赏心悦目、更准确的形状重建。虽然最近提出了一些形状补全方法[24,55],但我们无法找到运行比较的代码,而它们的底层3D表示是我们广泛比较的体素网格。
观点:比起同类中使用体素的方法,我们的DeepSDF模型重建效果更好
Latent Space Shape Interpolation
- 原文:To show that our learned shape embedding is complete and continuous, we render the results of the decoder when a pair of shapes are interpolated in the latent vector space (Fig. 1). The results suggests that the embedded continuous SDF’s are of meaningful shapes and that our representation extracts common interpretable shape features, such as the arms of a chair, that interpolate linearly in the latent space.
- 译文:为了表明我们学习生成的模型嵌入是完整和连续,当一对形状被插在潜在的向量空间(图1),我展示的是使用解码器的结果。结果表明,嵌入式连续SDF的结果,有实际意义的模型,是我们的表示是常见的可翻译的形状特征,如椅子的扶手,插入线性的潜在空间。
观点:DeepSDF生成的模型是连续的,事实证明使用解码器是有效果的。
7 Conclusion & Future Work
-
原文:DeepSDF significantly outperforms(胜过,比什么做的好) the applicable benchmarked methods across shape representation and completion tasks and simultaneously addresses the goals of representing complex topologies, closed surfaces, while providing high quality surface normals of the shape. However, while point-wise forward sampling of a shape’s SDF is efficient, shape completion (auto-decoding) takes considerably more time during inference due to the need for explicit optimization over the latent vector. We look to increase performance by replacing ADAM optimization with more efficient Gauss-Newton or similar methods that make use of the analytic derivatives of the model.
-
译文:DeepSDF在形状表示和完成任务中显著优于适用的基准方法,同时解决了表示复杂拓扑、闭合曲面的目标,同时提供高质量的形状曲面法线。然而,虽然对形状的SDF进行点向前采样是有效的,但由于需要对潜在向量进行显式优化,形状补全(自动解码)在推理过程中需要相当多的时间。我们希望通过使用更有效的高斯-牛顿或利用模型解析导数的类似方法来替代ADAM优化来提高性能。
观点:DeepSDF比起其他的基准的方法,实现了复杂拓扑结构、闭合曲面目标,提供高质量的形状曲面法线。对于形状推理和优化,即解码过程,是极其耗时间的,使用ADAM优化。
- 原文:DeepSDF models enable representation of more complex shapes without discretization errors with significantly less memory than previous state-of-the-art results as shown in Table 1, demonstrating an exciting route ahead for 3D shape learning. The clear ability to produce quality latent shape space interpolation opens the door to reconstruction algorithms operating over scenes built up of such efficient encodings. However, DeepSDF currently assumes models are in a canonical pose and as such completion in-the-wild requires explicit optimization over a SE(3) transformation space increasing inference time. Finally, to represent the true space-of-possible-scenes including dynamics and textures in a single embedding remains a major challenge, one which we continue to explore.
- 译文:如表1所示,DeepSDF模型可以表示更复杂的形状,而不会产生离散化误差,且与之前的最先进的结果相比,DeepSDF模型的内存显著减少,这为三维形状学习开辟了一条令人兴奋的道路。产生高质量潜在形状空间插值的清晰能力为重构算法在由这种高效编码构建的场景上运行打开了大门。然而,DeepSDF目前假定模型处于规范状态,因此在野外完成需要在SE变换空间上进行显式优化,从而增加推理时间。最后,在单一的嵌入中表现真实的空间-可能的场景,包括动态和纹理仍然是一个主要的挑战,我们将继续探索。
观点:DeepSDF的优点:表示更加复杂的形状,不会产生离散化,占用的内存显著减少。
Supplementary
Overview
- 原文:This supplementary material provides quantitative and qualitative experimental results along with extended technical details that are supplementary to the main paper. We first describe the shape completion experiment with noisy depth maps using DeepSDF (Sec. B). We then discuss architecture details (Sec. C) along with experiments exploring characteristics and tradeoffs of the DeepSDF design decisions (Sec. D). In Sec. E we compare auto-decoders with variational and standard auto-encoders. Further, additional details on data preparation (Sec. F), training (Sec. G), the auto-decoder learning scheme (Sec. H), and quantitative evaluations (Sec. I) are presented, and finally in Sec. J we provide additional quantitative and qualitative results.
- 译文:这个补充材料提供了定量和定性的实验结果以及扩展的技术细节,作为对主要论文的补充。我们首先使用DeepSDF描述带有噪声深度映射的形状完成实验(第B节)。然后我们讨论架构细节(第C节),以及探索DeepSDF设计决策的特性和权衡的实验(第D节)。在第E节中,我们将自动解码器与可变和标准自动编码器进行比较。此外,还提供了数据准备(第F节)、训练(第G节)、自动解码器学习方案(第H节)和定量评估(第I节)的其他细节,最后在第J节中,我们提供了额外的定量和定性结果。
标签:Distance,Functions,DeepSDF,shape,SDF,译文,形状,3D 来源: https://blog.csdn.net/Blackoutdragon/article/details/114384525