The IoT (Internet of Things) technology has been widely adopted in recent
years and has profoundly changed the people's daily lives. However, in the
meantime, such a fast-growing technology has also introduced new privacy
issues, which need to be better understood and measured. In this work, we look
into how private information can be leaked from network traffic generated in
the smart home network. Although researchers have proposed techniques to infer
IoT device types or user behaviors under clean experiment setup, the
effectiveness of such approaches become questionable in the complex but
realistic network environment, where common techniques like Network Address and
Port Translation (NAPT) and Virtual Private Network (VPN) are enabled. Traffic
analysis using traditional methods (e.g., through classical machine-learning
models) is much less effective under those settings, as the features picked
manually are not distinctive any more. In this work, we propose a traffic
analysis framework based on sequence-learning techniques like LSTM and
leveraged the temporal relations between packets for the attack of device
identification. We evaluated it under different environment settings (e.g.,
pure-IoT and noisy environment with multiple non-IoT devices). The results
showed our framework was able to differentiate device types with a high
accuracy. This result suggests IoT network communications pose prominent
challenges to users' privacy, even when they are protected by encryption and
morphed by the network gateway. As such, new privacy protection methods on IoT
traffic need to be developed towards mitigating this new issue.