To Thanh Dat

Undergraduate Student @ University of Science, VNU-HCM

My general interests are Deep Learning, Computer Vision and Multimodal Models and their applications in real-world problems
Currently, my research focused on Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs).

Email: ttdat2419@clc.fitus.edu.vn


Research

Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement
Dat To-Thanh*, Nghia Nguyen-Trong, Hoang Vo, Hieu Bui-Minh, Tinh-Anh Nguyen-Nhu
Published in Mobile AI Workshop at CVPR 2026
STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models
Tinh-Anh Nguyen-Nhu* and Triet Dao Hoang Minh* and Dat To-Thanh* and Phuc Le-Gia and Tuan Vo-Lan and Tien-Huy Nguyen
Published in 9th AI City Challenge Workshop at ICCV 2025

Projects


(CVPRW 2026) Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement
Designing an efficient image enhancement model for RGB photos. The model is designed to improve the visual quality of images to match one taken from Canon 70D DSLR, while maintaining computational efficiency, making it suitable for real-time applications on mobile devices. The 8-bit quantized model achieved 21.050 PSNR and 0.725 SSIM on the DPED dataset even with only 915K parameters.
CVPyTorchImage Enhancement

Vesuvius Surface Detection
Performing 3D image segmentation to detect surfaces in ancient scrolls. Experimenting with different techniques such as 2.5D approach using MONAI library, 3D segmentation using nnUNetv2 library, and post-processing methods to improve segmentation quality.
CVPyTorch3D Segmentation

(ICCVW 2025) STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models
Enhancing traffic video understanding and captioning by developing rigorous pipeline that integrates spatio and temporal information to boost the performance of existing Vision-Language Models. Designing novel caption decomposition strategy to cover spatio and temporal aspects of traffic videos. Extensive experiments on AI City Challenge datasets demonstrate the effectiveness of our proposed method. This pipeline achieved 7th place in the ICCV 2025 AI City Challenge Track 2.
PyTorchMultimodal Models

Segmentation on Cityscapes Dataset using MaskRCNN and DeeplabV3
Implementing and comparing MaskRCNN and DeeplabV3 for semantic segmentation on the Cityscapes dataset. Training both models from scratch and evaluating their performance using metrics such as mIoU and pixel accuracy.
PyTorchCV2D Segmentation

Implementing model from research paper
Implementing normal research papers in the field of deep learning, computer vision and natural language processing. This project serves as a personal repository to practice and understand various research papers by implementing them from scratch.
PythonNLPCVMultimodal Models

Activities


Mentor of Computer Science and Engineering Technology (CSET) Club, Ben Tre High School for Gifted Students
March 2026 - Present
  • Mentoring students in the Computer Science and Engineering Technology (CSET) program at Ben Tre High School for Gifted Students
  • Providing guidance and support to help students develop their skills in computer science and engineering.
Co-head of AI&DS Team, Google Developer Group on Campus, University of Science, VNU-HCM
October 2025 - Present
  • Proposed ideas for activities, events and workshops in tech domain
  • Organized and hosted internal training sessions for members to improve their technical skills.

Blogs

Coming soon...

Unserious