Deep contrastive learning for feature alignment: Insights from housing-household relationship inference
Xiao Qian, Shangjia Dong, Rachel Davidson,
Deep contrastive learning for feature alignment: Insights from housing-household relationship inference,
Computers, Environment and Urban Systems,
Volume 121,
2025,
102328,
ISSN 0198-9715,
https://doi.org/10.1016/j.compenvurbsys.2025.102328.
(https://www.sciencedirect.com/science/article/pii/S019897152500081X)
Abstract: Housing and household characteristics are key determinants of social and economic well-being, yet our understanding of their interrelationships remains limited. This study addresses this knowledge gap by developing a deep contrastive learning (DCL) model to infer housing-household relationships using the American Community Survey (ACS) Public Use Microdata Sample (PUMS). More broadly, the proposed model is suitable for a class of problems where the goal is to learn joint relationships between two distinct entities without explicitly labeled ground truth data. Our proposed dual-encoder DCL approach leverages co-occurrence patterns in PUMS and introduces a bisect K-means clustering method to overcome the absence of ground truth labels. The dual-encoder DCL architecture is designed to handle the semantic differences between housing (building) and household (people) features while mitigating noise introduced by clustering. To validate the model, we generate a synthetic ground truth dataset and conduct comprehensive evaluations. The model further demonstrates its superior performance in capturing housing-household relationships in Delaware compared to state-of-the-art methods. A transferability test in North Carolina confirms its generalizability across diverse sociodemographic and geographic contexts. Finally, the post-hoc explainable AI analysis using SHAP values reveals that tenure status and mortgage information play a more significant role in housing-household matching than traditionally emphasized factors such as the number of persons and rooms.
Keywords: Feature alignment; Heterogeneous data; Deep contrastive learning; Housing-household inference; Explainable AI