Video compression plays a pivotal role in managingthestorageandtransmissionofmultimediacontent,especiallyin bandwidth-constrained environments. Nowadays, volumetricvideohasemergedasanattractivemultimediaapplication,whichprovideshighlyimmersivewatchingexperiences.How-ever, streaming the volumetric video demands prohibitively highbandwidth.Thus,effectivelycompressingitsunderlyingpointcloudframesisessentialtodeployingthevolumetricvideos.Theexistingcompressiontechniquesareeither3D-basedor2D-based,buttheystillhavedrawbackswhenbeingdeployedin practice. The 2D-based methods compress the videos in aneffective but slow manner, while the 3D-based methods featurehighcodingspeedsbutlowcompressionratios.Inthispaper,weproposea3D-basedcompressionframeworkthatreachesboth a high compression ratio and a real-time decoding speed.Thisprojectpresentsanintegratedhybridvideocompressionframework that combines traditional techniques such as motionestimation and Discrete Cosine Transform (DCT) coding withmachinelearningmethodologies.Theframeworkaimstoimprovecompression efficiency and maintain high-quality video outputthroughintelligentpredictionandoptimization.
Introduction
1. Introduction
Volumetric video represents 3D scenes with exceptional realism and interactivity, driving innovation in VR, AR, and MR. However, these videos generate massive data, creating challenges in storage, transmission, and real-time streaming. This study proposes a real-time compression framework that balances compression efficiency with computational performance to make volumetric video more accessible and streamable in real-world applications.
2. Related Work
A. Video Streaming
Streaming is widespread via services like Netflix, YouTube, and Twitch, supporting formats up to 4K HDR on various devices. Quality depends on bandwidth and device capability.
B. Cloud Compression
Compression before cloud storage or transfer reduces file sizes, lowers costs, and improves upload/download speeds. It’s essential for handling large media datasets efficiently.
3. Methodology
The framework combines hardware-accelerated techniques and adaptive algorithms to handle volumetric video efficiently.
A. Patch-Based Encoding
Videos are split into 3D patches, each encoded using VVC (Versatile Video Coding).
Localized compression and adaptive quality based on content relevance (foreground vs. background) reduce data without sacrificing key visuals.
B. Adaptive Compression
Region of Interest (ROI) encoding prioritizes viewer-relevant areas.
Compression dynamically adapts based on viewer position to optimize experience.
C. Real-Time Encoding & Decoding
Utilizes multi-core and parallel processing with pipeline architecture to minimize latency.
FFmpeg used to decode compressed videos with AAC audio support.
5. Results
High-quality video compression reduces file size by over 50%, maintaining visual fidelity.
Framework supports both conventional and volumetric video compression for streaming use cases.
6. Input/Output Video Properties
Figures and examples compare the original and compressed video, showcasing drastic size reduction with retained quality.
Conclusion
In the proposed methodology presented for video compression in the provided code snippet showcases a comprehensive approach to enhancing storage efficiency and transmission quality of digital video content. By leveraging techniques such as quadtree decomposition, discrete cosine transform (DCT) coding, motion estimation, and advanced compression parameters, the workflow effectively reduces redundancy while preserving visual fidelity.
Video compression frameworks play a crucial role in managing and optimizing the storage and transmission of video content. By employing various compression techniques and algorithms, these frameworks significantly reduce file sizes while maintaining acceptable quality levels, enabling efficient streaming and playback across diverse devices and network conditions.
A. Application Scope
Video compression frameworks are indispensable in numerous applications, from online video streaming services and video conferencing to multimedia storage and broadcasting. Their role is fundamental in enabling seamless and high-quality video experiences for users worldwide.
Key Contributions and Findings
1) Efficient Compression Techniques: The integration of quadtree decomposition allows adaptive partitioning of frames into variable-sized blocks, optimizing the encoding process based on spatial complexity. This method not only reduces bitrate but also enhances compression efficiency by focusing computational resources on significant image regions.
2) Motion Estimation and Compensation: Full search motion estimation combined with DCT-based coding facilitates accurate prediction of inter-frame motion, reducing temporal redundancy. This approach improves compression ratios without compromising perceptual quality, crucial for maintaining smooth video playback and minimizing storage requirements.
3) Quality Assessment and Enhancement: The inclusion of quality assessment metrics such as PSNR provides quantitative measures of compression effectiveness. Techniques like non-local means denoising and aggressive resizing further refine video quality, ensuring that compressed outputs meet perceptual expectations.
4) Practical Implementation: The utilization of external tools like ffmpeg for audio handling and final video compilation enhances the workflow\'s robustness and scalability. This integration streamlines the process of combining compressed video with optimized audio, facilitating seamless multimedia content delivery.
References
[1] G. K. Wallace, “The JPEG still picture compression standard,” IEEE Trans. Consum. Electron., vol. 38, no. 1, pp. xviii–xxxiv, Feb. 1992.
[2] T. Wiegand,G. J. Sullivan,G. Bjontegaard, and A. Luthra, “Overview ofthe H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, 2003.
[3] S. Ma, T. Huang, C. Reader, andW. Gao, “AVS2? Making video coding smarter [standardsin a nutshell],” IEEE Signal Process. Mag., vol. 32, no. 2, pp. 172–183, Mar. 2015.
[4] G. J. Sullivan,J.-R. Ohm,W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC)standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
[5] A. Norkin et al., “HEVC deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1746–1754, Dec. 2012.
[6] P. List, A. Joch, J. Lainema,G. Bjontegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614–619, Jul. 2003.
[7] G. Cote, B. Erol, M. Gallant, and F. Kossentini, “H.263+: Video coding atlow bitrates,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 7, pp. 849–866, Nov. 1998.
[8] S.-M. Lei, T.-C. Chen, and M.-T. Sun, “Video bridging based on H.261 standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 4, pp. 425–437, Aug. 1994.
[9] L. Fan, S. Ma, and F. Wu, “Overview of AVS video standard,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), vol. 1, Jun. 2004, pp. 423–426.
[10] C.-M. Fu et al., “Sample adaptive offsetin the HEVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1755–1764, Dec. 2012.
[11] C.-Y. Tsai et al., “Adaptive loop filtering for video coding,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 6, pp. 934–945, Dec. 2013.
[12] J. Zhang, D. Zhao, and W. Gao, “Group-based sparse representation for image restoration,” IEEE Trans. Image Process., vol. 23, no. 8, pp. 3336–3351, Aug. 2014.
[13] S. Ma, X. Zhang, J. Zhang, C. Jia, S. Wang, and W. Gao, “Nonlocal in-loop filter: The way toward nextgeneration video coding?” IEEE MultiMedia, vol. 23, no. 2, pp. 16–26, Apr./Jun. 2016.
[14] X. Zhang et al., “Low-rank-based nonlocal adaptive loop filter for high-efficiency video compression,” IEEE Trans. Circuits Syst. Video Technol., vol. 27, no. 10, pp. 2177–2188,Oct. 2017.
[15] X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao, “Compression artifactsreduction by overlapped-block transformcoefficient estimation with block similarity,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 4613–4626, Dec. 2013 17.
[16] X. Zhang, W. Lin, R. Xiong, X. Liu, S. Ma, andW. Gao, “Lowrank decomposition-based restorationof compressed images via adaptive noise estimation,” IEEE Trans. Image Process., vol. 25, no. 9,pp. 4158–4171, Sep. 2016.
[17] X. Zhang, R. Xiong, W. Lin, S. Ma, J. Liu, andW. Gao, “Video compression artifact reductionvia spatio-temporal multi-hypothesis prediction,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 6048–6061, Dec. 2015.
[18] Y. LeCun, Y. Bengio, andG. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015.
[19] K. Zhang,W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond aGaussiandenoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017.
[20] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifactsreduction by a deep convolutional network,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 576–584.
[21] RufaelMekuria, KeesBlom, and Pablo Cesar. 2016. Design, Implementation, and Evaluation of a Point CloudCodec for Tele-Immersive Video. IEEE Transactionson Circuits and Systems for Video Technology 27, 4 (2016), 828–842.
[22] RufaelMekuria and Pablo Cesar. 2016. MP3DG-PCC, Open Source SoftwareFramework for Implementation and Evaluation of Point Cloud Compression. InProceedings of the 24th ACM International Conference on Multimedia. 1222–1226.
[23] Jounsup Park, PhilipAChou, and Jenq-Neng Hwang. 2019. Rate-Utility OptimizedStreaming of Volumetric Media for Augmented Reality. IEEE Journal on Emergingand Selected Topics in Circuits and Systems 9, 1 (2019), 149–162.
[24] FengQian, Bo Han, Jarrell Pair, and Vijay Gopalakrishnan. 2019. Toward PracticalVolumetric Video Streaming on Commodity Smartphones. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications. 135–140.
[25] ITUR Rec. 1995. BT 601: Studio encoding parameters of digital television forstandard 4: 3 and wide-screen 16: 9 aspect ratios. ITU-R Rec. BT 656 (1995).
[26] Ruwen Schnabel and Reinhard Klein. 2006. Octree-based Point-Cloud Compression.In Proceedings of the 3rd Eurographics. 111–120.
[27] Sebastian Schwarz, Gaëlle Martin-Cocher, David Flynn, and MadhukarBudagavi.2018. Common Test Conditions for Point Cloud Compression. Document ISO/IECJTC1/SC29/WG11 w17766, Ljubljana, Slovenia (2018).
[28] Sebastian Schwarz, Marius Preda, Vittorio Baroncini, MadhukarBudagavi, PabloCesar, Philip A Chou, Robert A Cohen, MajaKrivoku?a, SébastienLasserre,Zhu Li, et al. 2018. Emerging MPEG standards for Point Cloud Compression.IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 1 (2018),133–148.
[29] Aleksandr Segal, Dirk Haehnel, and Sebastian Thrun. 2009. Generalized-icp. 435.
[30] Jacopo Serafin and Giorgio Grisetti. 2015. NICP: Dense Normal based Point CloudRegistration. In 2015 IEEE/RSJ International Conference on Intelligent Robots andSystems. 742–749.
[31] Yiting Shao, Zhaobin Zhang, Zhu Li, Kui Fan, and Ge Li. 2017. Attribute Compressionof 3D Point Clouds Using LaplacianSparsity Optimized Graph Transform.In 2017 IEEE Visual Communications and Image Processing. 1–4.