The presence of a massive body between the Earth and a gravitational-wave source will produce the so-called gravitational lensing effect. In the case of strong lensing, it leads to the observation of multiple deformed copies of the initial wave. Machine-learning models have been proposed for identifying these copies much faster than optimal Bayesian methods, as will be needed with the detection rate of next-generation detectors. Most of these machine-learning models are based on a time-frequency representation of the data that discards the phase information. We introduce a neural network that directly uses the time series data to retain the phase, limit the preprocessing time, and keep a one-dimensional input. We show that our model is more efficient than the base model used on time-frequency maps at any false alarm rate, up to ~5 times more for a false-alarm rate of 1e-4. We also show that it is not significantly impacted by the choice of waveform model, by lensing-induced phase shifts, and by reasonable errors on the merger time that induce a misalignment of the waves in the input.