SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization [paper] [code]

Synthetic Customization Dataset (SynCD) consists of multiple images of the same object in different contexts. We achieve it by promoting similar object identity using either explicit 3D object assets or, more implicitly, using masked shared attention across different views while generating images. Given this training data, we train a new encoder-based model for the task, which can successfully generate new compositions of a reference object using text prompts. You can download our dataset here.

Our model supports multiple input images of the same object as references. You can upload up to 3 images, with better results on 3 images vs 1 image.

HF Spaces often encounter errors due to quota limitations, so recommend to run it locally.

1 5
1 100
0 2147483647

Whether its a rigid object or a deformable object like pet animals, wearable etc.

Enable CPU Offload to avoid memory issues

Examples
Enter your prompt, more descriptive prompt will lead to better results img1 img2 img3 Guidance Scale Seed rigid_object

Citation
If you find this repository useful, please consider giving a star ⭐ and a citation

@article{kumari2025syncd,
        title={Generating Multi-Image Synthetic Data for Text-to-Image Customization},
        author={Kumari, Nupur and Yin, Xi and Zhu, Jun-Yan and Misra, Ishan and Azadi, Samaneh},
        journal={ArXiv},
        year={2025}
      }

Contact
If you have any questions, please feel free to open an issue or directly reach us out via email.

Acknowledgement
This space was modified from OmniGen space.