Post-doctoral research visit (F/M) in Data Sciences and Statistical Learning in Université Gustave Eiffel
Multi-source data Mining for the analysis of human mobility
Starting date: ASAP
Main Location: Champs-Sur-Marne (77) or Lyon
Duration: Up to 24 months
Level of qualifications required: PhD or equivalent
Remuneration 2600 € / month (before taxes)
The post-doctoral research position is funded by the ANR PRCE MobiTiC project (https://mobitic.huma-num.fr/). The post-doctoral fellow will join a research team working on statistical learning for mining complex urban data for a duration of 24 months. Several partners are involved in the project: two research laboratories of Université Gustave Eiffel (GRETTIA- Coordinator and LICIT-ECO7), the SENSE laboratory of Orange Labs, the SSP lab of INSEE.
MobiTic’s methodology shall produce novel models, analyses and indicators of presence and mobility that are relevant, reliable, compliant with privacy rules, representative and frequently updated. These indicators shall be produced by combining (aggregate) mobile phone network signalling and other digital and traditional data sources (smart card data, surveys, GPS data, Census data, loop detectors’ data, etc.). Digital data open fundamental perspectives for the dynamic analysis of territories, at finer levels of geographical and temporal accuracy, and may provide actors with the potential to manage their resources more efficiently, which is a fundamental requirement for the sustainable development of the territories. The project will draw upon the disciplinary fields of statistics, machine learning, and data analysis.
In recent years, there has been a great deal of research work dedicated to the use of digital data for mobility analysis. Most of these research studies focus on using a single data source and a single mode of transport. In this respect, we can cite the work carried out by researchers from Université Gustave Eiffel involved in the MobiTic project on the use of smart card data to analyze the use of shared mobility systems [1, 2], or public transport systems , the use of Bluetooth data for road traffic analysis  or the use of mobile phone data to reconstruct human trajectories [5, 6], understand urban dynamics  and analyze road transport resilience .
The MobiTIC project aims to combine several data sources to analyse human presence and mobility.
This postdoc proposal aims to develop data mining and data fusion approaches to mine and enrich mobile phone signaling aggregate data. One of the main issues to be addressed concerns the imputation of the trip mode and purpose [8, 9, 10]. Indeed, mobile phone data alone cannot provide the whole information required to build mobility indicators. In addition to socio-demographic information, indicators on human mobility (Origin-Destination matrices) can benefit from other data sources such as ticketing data, electromagnetic loop data for road traffic. The joint usage of multiple sources will make it possible to fully take advantage of the strengths of each type of source and overcome their inner individual limitations and biases.
The postdoc is expected to address, during the two years, the following challenges:
– Explore and comparatively analyze a unique large-scale dataset on human mobility made available by the mobile phone operator Orange and its commercial service Orange Flux Vision in the context of the MobiTIC project. Among the expected goals, the candidate will have to evaluate potential spatiotemporal biases of the data as well as validate the available inferred (multi-modal) mobility flows with respect to diverse available sources of human mobility data, including road traffic GPS data, public transportation validations (smart card and ticketing data) and loop detectors. The analysis will focus on the Rhone-Alpes region, France, and exploit classical data mining and machine learning solutions to produce spatio-temporal indicators of human mobility (Origin-Destination matrices, travelled distances, radius of gyration, etc.), possibly by travel mode and motifs.
– Applying data fusion techniques to combine the heterogeneous multi-source available data (private road traffic, public transport, mobility indicators derived from mobile phone data and surveys) to generate synthetic populations that can reproduce, realistically and anonymously, the mobility flow in the case-study area. Synthetic populations can also be used for agent-based microsimulation. To this purpose, existing work from the team and the literature concerning data-driven probabilistic population synthesis based on Iterative Proportional Fitting (IPF) , Markov Chains , Bayesian Networks  and Generalized Raking  will be the basis for further development.
– Detecting and exploring human mobility behaviours in the presence of atypical situations (e.g. bad weather, accidents, extreme events, pandemic, etc.) to identify and describe how people react to abnormal, sudden or unexpected situations. A better knowledge of such behaviours is of fundamental importance in the field of transport to properly calibrate the mobility offer and to implement new control solutions that can improve the resilience of the transport system. To this end, the availability of large-scale mobile phone data could be exploited and cross-referenced with other data sources (web news, social network data, historical weather data) that contain information on abnormal situations impacting mobility. In particular, the available datasets will cover a long historical time frame, including pre-, during- and post-pandemic periods thus allowing for a detailed evaluation of the evolution of mobility practices with respect to the COVID-19 pandemic.
Candidates for the post-doctoral position must already have obtained a Ph.D. in computer sciences, applied mathematics or statistics. Good programming skills in Python and/or R are expected, as well as advanced knowledge of data mining and machine learning libraries (e.g., pandas, scipy, sklearn, etc.) in the mentioned programming languages. Skills in distributed data mining (Spark, pySpark) will be beneficial. Strong motivation to work on an interdisciplinary project with applicative issues will be appreciated. Research paper writing skills are a must.
To apply, the candidate can send a CV, a cover letter, and the contacts of two referees to the contacts mentioned below.
About Université Gustave Eiffel
The candidate will be hosted in the GRETTIA laboratory (Engineering of Land Transport Networks and Advanced Computing) of the Gustave Eiffel University – Marne-la-Vallée Campus. Trips to Université Gustave Eiffel – Lyon campus are to be expected.
14-20 Bd Newton, 77 420 Champs-sur-marne
Tél : +33 (0)1 81 66 87 19
14-20 Bd Newton, 77 420 Champs-sur-marne
Tél : +33 (0)1 81 66 87 18
25 Avenue François Mitterrand, 69500 Bron
Tel. +33 (0)4 78 65 68 70
 E. Côme, L. Oukhellou, (2014). Model-based count series clustering for Bike-sharing system usage mining, a case study with the Vélib system of Paris, ACM Transactions on Intelligent Systems and Technology (TIST). 5(3). Ed. ACM.
 Briand AS, Côme E, Trépanier M, Oukhellou L (2017). Analyzing year-to-year changes in public transport passenger behaviour using smart card data, Transportation Research Part C, 79, 274-89.
 El Mahrsi MK, Côme E, Oukhellou L, Verleysen M (2017). Clustering Smart Card Data for Urban Mobility Analysis, IEEE Transactions on Intelligent Transportation Systems 18(3), pp. 1 – 17.
 P-A. Laharotte, R. Billot, E. Côme, L. Oukhellou, A. Nantes, N-E El Faouzi (2015) Spatiotemporal Analysis of Bluetooth Data: Application to a Large Urban Network. IEEE Transactions on Intelligent Transportation Systems 16(3): 1439-1448.
 Bonnetain, L., Furno, A., El Faouzi, N. E., Fiore, M., Stanica, R., Smoreda, Z., & Ziemlicki, C. (2021). TRANSIT: Fine-grained human mobility trajectory inference at scale with mobile network signaling data. Transportation Research Part C: Emerging Technologies, 130, 103257.
 Bonnetain, L., Furno, A., Krug, J., & Faouzi, N. E. E. (2019). Can We Map-Match Individual Cellular Network Signaling Trajectories in Urban Environments? Data-Driven Study. Transportation Research Record, 2673(7), 74-88.
 Furno A, Fiore M, Stanica R, Ziemlicki C, Smoreda Z (2017). A Tale of Ten Cities: Characterizing Signatures of Mobile Traffic in Urban Areas, IEEE TMC 16(10).
 Henry E., Bonnetain L., Furno A., El Faouzi N.E., Zimeo E. (2019, June). Spatio-temporal Correlations of Betweenness Centrality and Traffic Metrics. In 6th International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS).
 Bonnel, P. & Hombourger, E. & Olteanu R., A-M. & Z., Smoreda. (2015). Passive Mobile Phone Dataset to Construct Origin-destination Matrix: Potentials and Limitations. Transportation Research Procedia. 11. 381-398. 10.1016/j.trpro.2015.12.032.
 Hörl, S., & Balac, M. (2021). Synthetic population and travel demand for Paris and Île-de-France based on open and publicly available data. Transportation Research Part C: Emerging Technologies, 130, 103291.
 Zhang, D., Cao, J., Feygin, S., Tang, D., Shen, Z. J. M., & Pozdnoukhov, A. (2019). Connected population synthesis for transportation simulation. Transportation research part C: emerging technologies, 103, 1-16.
 Sun, L., & Erath, A. (2015). A Bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies, 61, 49-62.
 Zhou, M., Li, J., Basu, R., & Ferreira, J. (2022). Creating spatially-detailed heterogeneous synthetic populations for agent-based microsimulation. Computers, Environment and Urban Systems, 91, 101717.