人工智能 | ShowMeAI资讯日报 #2022.06.22

继续创造,加快生长!这是我参与「日新方案 6 月更文应战」的第24天,点击检查活动详情

ShowMeAI日报系列全新晋级!覆盖AI人工智能 东西&结构 | 项目&代码 | 博文&共享 | 数据&资源 | 研讨&论文 等方向。点击检查 历史文章列表,在大众号内订阅论题 #ShowMeAI资讯日报,可接纳每日最新推送。点击 专题合辑&电子月刊 快速阅览各专题全集。

1.东西&结构

人工智能 | ShowMeAI资讯日报 #2022.06.22

东西:Unclutter – Immersive Reading Mode,排除干扰信息专心阅览的阅览器插件

‘Unclutter – Immersive Reading Mode – A reader mode browser extension to remove distractions from web articles.’ by lindylearn

GitHub: github.com/lindylearn/…

人工智能 | ShowMeAI资讯日报 #2022.06.22

东西库:scikit-opt – 一个纯Python集体智能算法库

包含许多算法(差分进化算法、遗传算法、粒子群算法、模拟退火算法、蚁群算法、鱼群算法、免疫优化算法),特点是轻量、易布置,支撑GPU运算。

GitHub: github.com/guofei9987/…

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

东西:Hayabusa – 根据sigma的Windows事情日志剖析东西

它协助安全人员快速找到安全要挟。

GitHub: github.com/Yamato-Secu…

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

东西:Gifsicle – 一个在阅览器里进行gif修改的东西。

Gifsicle可以对Gif图片进行紧缩,旋转,裁剪等操作

GitHub: github.com/renzhezhilu…

人工智能 | ShowMeAI资讯日报 #2022.06.22

东西库:AREkit – 文档级特点联系提取东西包

‘AREkit – Document level Attitude and Relation Extraction toolkit (AREkit) for mass-media news and analytical articles’ by Nicolay Rusnachenko

GitHub: github.com/nicolay-r/A…

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

2.博文&共享

人工智能 | ShowMeAI资讯日报 #2022.06.22

课程:新加坡国立大学《3D核算机视觉》

《3D Computer Vision | National University of Singapore – YouTube》

Link: www.youtube.com/playlist?li…

人工智能 | ShowMeAI资讯日报 #2022.06.22

博文:Vim 命令、操作、快捷键全集

Link: weibo.com/ttarticle/p…

人工智能 | ShowMeAI资讯日报 #2022.06.22

3.数据&资源

人工智能 | ShowMeAI资讯日报 #2022.06.22

资源列表:深度学习3D视觉最新论文列表

‘Trending-in-3D-Vision – An on-going paper list on new trends in 3D vision with deep learning’ by Xiaolong

GitHub: github.com/dragonlong/…

人工智能 | ShowMeAI资讯日报 #2022.06.22

书本:《Python Data Science Handbook》Python数据科学

介绍数据科学和运用的书本。内容覆盖:① 数据科学家需求的核算环境:IPython和Jupyter ② NumPy东西库与科学核算 ③ Pandas与数据处理 ④ Matplotlib与数据可视化 ⑤ Scikit-Learn与机器学习。

英文原版地址: jakevdp.github.io/PythonDataS…

非官方中文翻译地址: github.com/wangyingsm/…

人工智能 | ShowMeAI资讯日报 #2022.06.22

4.研讨&论文

人工智能 | ShowMeAI资讯日报 #2022.06.22

大众号回复要害字 日报,免费获取整理好的6月论文合辑。

论文:Automatic Prosody Annotation with Pre-Trained Text-Speech Model

论文标题:Automatic Prosody Annotation with Pre-Trained Text-Speech Model

论文时刻:16 Jun 2022

所属范畴:语音

对应使命:Speech Synthesis,Text-To-Speech Synthesis,语音组成,文本到语音组成

论文地址:arxiv.org/abs/2206.07…

代码完成:github.com/daisyqk/aut…

论文作者:Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu

论文简介:Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability./就天然性和可读性而言,韵律鸿沟在文本到语音组成 (TTS) 中起着重要作用。

论文摘要:Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability. However, the acquisition of prosodic boundary labels relies on manual annotation, which is costly and time-consuming. In this paper, we propose to automatically extract prosodic boundary labels from text-audio data via a neural text-speech model with pre-trained audio encoders. This model is pre-trained on text and speech data separately and jointly fine-tuned on TTS data in a triplet format: {speech, text, prosody}. The experimental results on both automatic evaluation and human evaluation demonstrate that: 1) the proposed text-speech prosody annotation framework significantly outperforms text-only baselines; 2) the quality of automatic prosodic boundary annotations is comparable to human annotations; 3) TTS systems trained with model-annotated boundaries are slightly better than systems that use manual ones.

就天然性和可读性而言,韵律鸿沟在文本到语音组成 (TTS) 中起着重要作用。但是,韵律鸿沟标签的获取依靠于人工标示,本钱高且耗时。在本文中,咱们主张经过带有预练习音频编码器的神经文本语音模型从文本音频数据中主动提取韵律鸿沟标签。该模型别离在文本和语音数据上进行预练习,并在三元组格局的 TTS 数据上联合微调:{语音、文本、韵律}。主动评价和人工评价的试验成果标明:1)所提出的文本语音韵律注释结构明显优于纯文本基线; 2)主动韵律鸿沟标示的质量与人工标示适当; 3) 运用模型标示鸿沟练习的 TTS 体系比运用手动鸿沟的体系稍好。

人工智能 | ShowMeAI资讯日报 #2022.06.22

论文:Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot

论文标题:Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot

论文时刻:16 Jun 2022

所属范畴:核算机视觉

对应使命:无人驾驶,主动驾驶

论文地址:arxiv.org/abs/2206.08…

代码完成:github.com/openpercept…

论文作者:Li Chen, Tutian Tang, Zhitian Cai, Yang Li, Penghao Wu, Hongyang Li, Jianping Shi, Junchi Yan, Yu Qiao

论文简介:Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design./装备广泛的传感器,主要的主动驾驶解决方案正变得更加模块化,以完成安全体系规划。

论文摘要:Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design. Though these sensors have laid a solid foundation, most massive-production solutions up to date still fall into L2 phase. Among these, Comma.ai comes to our sight, claiming one $999 aftermarket device mounted with a single camera and board inside owns the ability to handle L2 scenarios. Together with open-sourced software of the entire system released by Comma.ai, the project is named Openpilot. Is it possible? If so, how is it made possible? With curiosity in mind, we deep-dive into Openpilot and conclude that its key to success is the end-to-end system design instead of a conventional modular framework. The model is briefed as Supercombo, and it can predict the ego vehicle’s future trajectory and other road semantics on the fly from monocular input. Unfortunately, the training process and massive amount of data to make all these work are not publicly available. To achieve an intensive investigation, we try to reimplement the training details and test the pipeline on public benchmarks. The refactored network proposed in this work is referred to as OP-Deepdive. For a fair comparison of our version to the original Supercombo, we introduce a dual-model deployment scheme to test the driving performance in the real world. Experimental results on nuScenes, Comma2k19, CARLA, and in-house realistic scenarios verify that a low-cost device can indeed achieve most L2 functionalities and be on par with the original Supercombo model. In this report, we would like to share our latest findings, shed some light on the new perspective of end-to-end autonomous driving from an industrial product-level side, and potentially inspire the community to continue improving the performance. Our code, benchmarks are at github.com/OpenPercept…

主要的主动驾驶解决方案装备了广泛的传感器,在安全体系规划方面正变得更加模块化。尽管这些传感器现已奠定了坚实的根底,但迄今为止大多数量产解决方案仍处于 L2 阶段。其间,Comma.ai 出现在咱们的视线中,声称一款价格 999 美元的售后设备安装了单个摄像头和板卡,具有处理 L2 场景的才能。加上 Comma.ai 发布的整个体系的开源软件,该项目被命名为 Openpilot。或许吗?如果是这样,它是如何完成的?带着好奇心,咱们深入研讨了 Openpilot,并得出结论,它成功的要害是端到端的体系规划,而不是传统的模块化结构。该模型简称为 Supercombo,它可以从单目输入动态预测自我车辆的未来轨道和其他道路语义。不幸的是,一切这些工作的练习进程和很多数据都没有公开。为了进行深入调查,咱们测验重新完成练习细节并在公共基准上测试管道。在这项工作中提出的重构网络被称为 OP-Deepdive。为了将咱们的版别与原始 Supercombo 进行公正比较,咱们引入了双模型布置方案来测试实际世界中的驾驶功用。 nuScenes、Comma2k19、CARLA 和内部实在场景的试验成果验证了低本钱设备的确可以完成大多数 L2 功用,并且与原始 Supercombo 模型适当。在本报告中,咱们想共享咱们的最新发现,从工业产品层面阐明端到端主动驾驶的新视角,并或许激励社区继续进步功用。咱们的代码和基准坐落 github.com/OpenPercept…

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

论文:Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation

论文标题:Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation

论文时刻:15 Jun 2022

所属范畴:核算机视觉

对应使命:Contrastive Learning,Denoising,Image Generation,Music Generation,比照学习,去噪,图画生成,音乐生成

论文地址:arxiv.org/abs/2206.07…

代码完成:github.com/l-yezhu/cdc…

论文作者:Ye Zhu, Yu Wu, Kyle Olszewski, Jian Ren, Sergey Tulyakov, Yan Yan

论文简介:To this end, we introduce a Conditional Discrete Contrastive Diffusion (CDCD) loss and design two contrastive diffusion mechanisms to effectively incorporate it into the denoising process./为此,咱们引入了条件离散比照分散 (CDCD) 丢失,并规划了两种比照分散机制,以有效地将其纳入去噪进程。

论文摘要:Diffusion probabilistic models (DPMs) have become a popular approach to conditional generation, due to their promising results and support for cross-modal synthesis. A key desideratum in conditional synthesis is to achieve high correspondence between the conditioning input and generated output. Most existing methods learn such relationships implicitly, by incorporating the prior into the variational lower bound. In this work, we take a different route — we enhance input-output connections by maximizing their mutual information using contrastive learning. To this end, we introduce a Conditional Discrete Contrastive Diffusion (CDCD) loss and design two contrastive diffusion mechanisms to effectively incorporate it into the denoising process. We formulate CDCD by connecting it with the conventional variational objectives. We demonstrate the efficacy of our approach in evaluations with three diverse, multimodal conditional synthesis tasks: dance-to-music generation, text-to-image synthesis, and class-conditioned image synthesis. On each, we achieve state-of-the-art or higher synthesis quality and improve the input-output correspondence. Furthermore, the proposed approach improves the convergence of diffusion models, reducing the number of required diffusion steps by more than 35% on two benchmarks, significantly increasing the inference speed.

分散概率模型 (DPMs) 已成为一种流行的条件生成办法,因为它们具有可喜的成果和对跨模态组成的支撑。条件组成中的一个要害要求是在条件输入和生成的输出之间完成高度对应。大多数现有办法经过将先验结合到变分下限中来隐式地学习这种联系。在这项工作中,咱们采取了不同的道路 – 咱们经过运用比照学习最大化它们的互信息来增强输入-输出连接。为此,咱们引入了条件离散比照分散(CDCD)丢失,并规划了两种比照分散机制,以有效地将其纳入去噪进程。咱们经过将 CDCD 与传统的变分方针联系起来来拟定 CDCD。咱们展示了咱们的办法在评价三种不同的多模态条件组成使命中的有效性:舞蹈到音乐生成、文本到图画组成和类条件图画组成。在每一个方面,咱们都完成了最先进或更高的组成质量,并改善了输入-输出的对应联系。此外,所提出的办法进步了分散模型的收敛性,在两个基准上将所需的分散过程数量减少了 35% 以上,明显进步了推理速度。

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

论文:GLIPv2: Unifying Localization and Vision-Language Understanding

论文标题:GLIPv2: Unifying Localization and Vision-Language Understanding

论文时刻:12 Jun 2022

所属范畴:核算机视觉,天然言语处理

对应使命:Contrastive Learning,Image Captioning,Instance Segmentation,Language Modelling,Masked Language Modeling,object-detection,Object Detection,Phrase Grounding,Referring Expression Segmentation,Semantic Segmentation,Visual Question Answering,VQA,比照学习,图画字幕,实例切割,言语建模,蒙面言语建模,物体检测,物体检测,短语接地,参阅表达切割,语义切割,视觉问答

论文地址:arxiv.org/abs/2206.05…

代码完成:github.com/microsoft/G…

论文作者:Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao

论文简介:We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e. g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e. g., VQA, image captioning)./咱们提出了 GLIPv2,一种根据 VL 的了解模型,它一起服务于本地化使命(例如,对象检测、实例切割)和视觉言语 (VL) 了解使命(例如,VQA、图画字幕)。

论文摘要:We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, region-word contrastive learning as a novel region-word level contrastive learning task, and the masked language modeling. This unification not only simplifies the previous multi-stage VLP procedure but also achieves mutual benefits between localization and understanding tasks. Experimental results show that a single GLIPv2 model (all model weights are shared) achieves near SoTA performance on various localization and understanding tasks. The model also shows (1) strong zero-shot and few-shot adaption performance on open-vocabulary object detection tasks and (2) superior grounding capability on VL understanding tasks. Code will be released at github.com/microsoft/G…

咱们提出了 GLIPv2,一个根据 VL 的了解模型,它服务于本地化使命(例如,方针检测、实例切割)和视觉言语(VL)了解使命(例如,VQA、图画字幕/看图说话)。 GLIPv2 高雅地将定位预练习和视觉言语预练习 (VLP) 与三个预练习使命相结合:短语接地作为检测使命的 VL 重构,区域-词比照学习作为新的区域-词级比照学习使命,以及掩码言语建模。这种统一不仅简化了之前的多阶段 VLP 程序,并且完成了定位和了解使命之间的互相促进。试验成果标明,单个 GLIPv2 模型(一切模型权重共享)在各种定位和了解使命上完成了接近 SoTA 的功用。该模型还展示了(1)在开放词汇方针检测使命上的强壮的零样本和少样本适应功用和(2)在 VL 了解使命上的出色接地才能。代码将在 github.com/microsoft/G… 发布。

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

论文:Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging

论文标题:Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging

论文时刻:20 May 2022

所属范畴:核算机视觉

对应使命:Compressive Sensing,Image Reconstruction,Image Restoration,紧缩感知,图画重建,图画康复

论文地址:arxiv.org/abs/2205.10…

代码完成:github.com/caiyuanhao1…

论文作者:Yuanhao Cai, Jing Lin, Haoqian Wang, Xin Yuan, Henghui Ding, Yulun Zhang, Radu Timofte, Luc van Gool

论文简介:In coded aperture snapshot spectral compressive imaging (CASSI) systems, hyperspectral image (HSI) reconstruction methods are employed to recover the spatial-spectral signal from a compressed measurement./在编码孔径快照光谱紧缩成像 (CASSI) 体系中,选用高光谱图画 (HSI) 重建办法从紧缩丈量中康复空间光谱信号。

论文摘要:In coded aperture snapshot spectral compressive imaging (CASSI) systems, hyperspectral image (HSI) reconstruction methods are employed to recover the spatial-spectral signal from a compressed measurement. Among these algorithms, deep unfolding methods demonstrate promising performance but suffer from two issues. Firstly, they do not estimate the degradation patterns and ill-posedness degree from the highly related CASSI to guide the iterative learning. Secondly, they are mainly CNN-based, showing limitations in capturing long-range dependencies. In this paper, we propose a principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration. Moreover, we customize a novel Half-Shuffle Transformer (HST) that simultaneously captures local contents and non-local dependencies. By plugging HST into DAUF, we establish the first Transformer-based deep unfolding method, Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST), for HSI reconstruction. Experiments show that DAUHST significantly surpasses state-of-the-art methods while requiring cheaper computational and memory costs. Code and models will be released at github.com/caiyuanhao1…

在编码孔径快照光谱紧缩成像 (CASSI) 体系中,选用高光谱图画 (HSI) 重建办法从紧缩丈量中康复空间光谱信号。在这些算法中,深度打开办法表现出良好的功用,但存在两个问题。首先,它们没有从高度相关的 CASSI 中估量退化形式和不适定度来辅导迭代学习。其次,它们主要是根据 CNN 的,在捕获远程依靠方面表现出局限性。在本文中,咱们提出了一个原则性的退化感知打开结构(DAUF),它从紧缩图画和物理掩码中估量参数,然后运用这些参数来操控每次迭代。此外,咱们定制了一种新颖的 Half-Shuffle Transformer (HST),它一起捕获本地内容和非本地依靠项。经过将 HST 刺进 DAUF,咱们建立了第一个根据 Transformer 的深度打开办法,即 Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST),用于 HSI 重建。试验标明,DAUHST 明显超越了最先进的办法,一起所需核算量和内存本钱也降低了。代码和模型将在 github.com/caiyuanhao1… 发布

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

论文:HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

论文标题:HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

论文时刻:CVPR 2022

所属范畴:核算机视觉

论文地址:arxiv.org/abs/2201.04…

代码完成:github.com/chungyiweng…

论文作者:Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, Ira Kemelmacher-Shlizerman

论文简介:Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps./咱们的办法优化了人在规范 T 姿态中的体积标明,与运动场相一致,该运动场经过向后扭曲将估量的规范标明映射到视频的每一帧。

论文摘要:We introduce a free-viewpoint rendering method — HumanNeRF — that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios.

咱们介绍了一种自在角度烘托办法 – HumanNeRF – 它适用于人类履行复杂身体运动的给定单目视频,例如:来自 YouTube 的视频。咱们的办法可以在任何帧暂停视频,并从恣意新的摄像机角度甚至是该特定帧和身体姿态的完整 360 度摄像机途径烘托主体。这项使命特别具有应战性,因为它需求组成身体的逼真细节,从输入视频中或许不存在的各种摄像机角度看,以及组成精密的细节,如布料褶皱和面部外观。咱们的办法优化了典型 T 姿态中人的体积标明,与运动场相一致,该运动场经过向后扭曲将估量的典型标明映射到视频的每一帧。运动场被分解为由深度网络发生的骨骼刚性和非刚性运动。咱们展示了相对于先前工作的明显功用改进,以及在具有应战性的不受操控的捕获场景中移动人类的单目视频的自在角度烘托示例。

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

论文:SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

论文标题:SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

论文时刻:CVPR 2022

所属范畴:核算机视觉

对应使命:Disentanglement,Facial Editing,Image Generation,Transfer Learning,解缠结,人脸修改,图画生成,搬迁学习

论文地址:arxiv.org/abs/2112.02…

代码完成:github.com/seasonSH/Se…

论文作者:Yichun Shi, Xiao Yang, Yangyue Wan, Xiaohui Shen

论文简介:When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images./当与为 StyleGAN 规划的修改办法结合运用时,它可以完成更细粒度的操控来修改组成或实在图画。

论文摘要:Recent studies have shown that StyleGANs provide promising prior models for downstream tasks on image synthesis and editing. However, since the latent codes of StyleGANs are designed to control global styles, it is hard to achieve a fine-grained control over synthesized images. We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way. The structure and texture of different local parts are controlled by corresponding latent codes. Experimental results demonstrate that our model provides a strong disentanglement between different spatial areas. When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images. The model can also be extended to other domains via transfer learning. Thus, as a generic prior model with built-in disentanglement, it could facilitate the development of GAN-based applications and enable more potential downstream tasks.

最近的研讨标明,StyleGAN 为图画组成和修改的下流使命供给了有出路的先验模型。但是,因为 StyleGAN 的潜在代码旨在操控大局款式,因而很难完成对组成图画的细粒度操控。咱们提出了 SemanticStyleGAN,其间一个生成器被练习来别离对部分语义部分进行建模,并以组合的方式组成图画。不同部分部分的结构和纹路由相应的潜在代码操控。试验成果标明,咱们的模型在不同的空间区域之间供给了强壮的解耦。当与为 StyleGAN 规划的修改办法相结合时,它可以完成更细粒度的操控来修改组成或实在图画。该模型还可以经过搬迁学习扩展到其他范畴。因而,作为具有内置解缠结的通用先验模型,它可以促进根据 GAN 的运用程序的开发和支撑更多潜在的下流使命。

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

论文:3D-aware Image Synthesis via Learning Structural and Textural Representations

论文标题:3D-aware Image Synthesis via Learning Structural and Textural Representations

论文时刻:CVPR 2022

所属范畴:核算机视觉

对应使命:3D-Aware Image Synthesis,Image Generation,3D感知图画组成,图画生成

论文地址:arxiv.org/abs/2112.10…

代码完成:github.com/genforce/vo…

论文作者:Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, Bolei Zhou

论文简介:The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis./特征场进一步堆集成二维特征图作为纹路标明,然后是神经烘托器进行外观组成。

论文摘要:Making generative models 3D-aware bridges the 2D image space and the 3D physical world yet remains challenging. Recent attempts equip a Generative Adversarial Network (GAN) with a Neural Radiance Field (NeRF), which maps 3D coordinates to pixel values, as a 3D prior. However, the implicit function in NeRF has a very local receptive field, making the generator hard to become aware of the global structure. Meanwhile, NeRF is built on volume rendering which can be too costly to produce high-resolution results, increasing the optimization difficulty. To alleviate these two problems, we propose a novel framework, termed as VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation. We first learn a feature volume to represent the underlying structure, which is then converted to a feature field using a NeRF-like model. The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis. Such a design enables independent control of the shape and the appearance. Extensive experiments on a wide range of datasets show that our approach achieves sufficiently higher image quality and better 3D control than the previous methods.

使生成模型具有 3D 感知才能在 2D 图画空间和 3D 物理世界之间架起一座桥梁,但仍然具有应战性。最近的测验为生成对立网络 (GAN) 装备了神经辐射场 (NeRF),它将 3D 坐标映射到像素值,作为 3D 先验。但是,NeRF 中的隐函数具有非常部分的感受野,使得生成器很难意识到大局结构。一起,NeRF 建立在体制作之上,其本钱太高而无法发生高分辨率成果,然后增加了优化难度。为了缓解这两个问题,咱们提出了一种称为 VolumeGAN 的新颖结构,用于经过显式学习结构标明和纹路标明来进行高保真 3D 感知图画组成。咱们首先学习一个特征量来标明底层结构,然后运用相似 NeRF 的模型将其转换为特征场。特征场进一步累积成 2D 特征图作为纹路标明,然后是用于外观组成的神经烘托器。这样的规划可以独立操控形状和外观。在广泛的数据集上进行的很多试验标明,咱们的办法比曾经的办法完成了更高的图画质量和更好的 3D 操控。

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

人工智能 | ShowMeAI资讯日报 #2022.06.22

咱们是 ShowMeAI,致力于传播AI优质内容,共享职业解决方案,用常识加快每一次技能生长!点击检查 历史文章列表,在大众号内订阅论题 #ShowMeAI资讯日报,可接纳每日最新推送。点击 专题合辑&电子月刊 快速阅览各专题全集。

人工智能 | ShowMeAI资讯日报 #2022.06.22

  • 作者:韩信子@ShowMeAI
  • 历史文章列表
  • 专题合辑&电子月刊
  • 声明:版权一切,转载请联系渠道与作者并注明出处
  • 欢迎回复,拜托点赞,留言推荐中有价值的文章、东西或主张,咱们都会赶快回复哒~