A. Ahmed, P. Varakantham, M. Lowalekar, Y. Adulyasak, and P. Jaillet, Sampling based approaches for minimizing regret in uncertain Markov decision processes (MDPs), J. Artificial Intelligence Research, vol.59, pp.229-264, 2017.

P. Alizadeh, Y. Chevaleyre, and F. Lévy, Solving MDPs with unknown rewards using nondominated vectorvalued functions, Proc. 8th European Starting AI Researcher Symposium, pp.15-26, 2016.

P. Alizadeh, Y. Chevaleyre, and J. Zucker, Approximate regret based elicitation in Markov decision process, The 2015 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for Future (RIVF), pp.47-52, 2015.
DOI : 10.1109/RIVF.2015.7049873

A. Azaria, Y. Gal, S. Kraus, and C. V. Goldman, Strategic advice provision in repeated human-agent interactions, Autonomous Agents and Multi-Agent Systems, vol.5, issue.4, pp.4-29, 2016.
DOI : 10.1080/01621459.1949.10483310
URL : http://www.cs.biu.ac.il/%7Esarit/data/articles/repeated.pdf

D. E. Bell, Regret in Decision Making under Uncertainty, Operations Research, vol.30, issue.5, pp.961-981, 1982.
DOI : 10.1287/opre.30.5.961

J. F. Benders, Partitioning procedures for solving mixed-variables programming problems, Numerische Mathematik, vol.38, issue.1, pp.238-252, 1962.
DOI : 10.1007/BF01386316

R. Busa-fekete, B. Szörényi, P. Weng, W. Cheng, and E. Hüllermeier, Preference-based evolutionary direct policy search, ICRA Workshop on Autonomous Learning, 2013.
DOI : 10.1007/s10994-014-5458-8
URL : https://hal.archives-ouvertes.fr/hal-01216088

T. Cederborg, I. Grover, C. L. Isbell, and A. L. Thomaz, Policy shaping with human teachers, Proc. 24th International Joint Conference on Artificial Intelligence (IJ- CAI 2015), pp.3366-3372, 2015.

V. F. Da-silva, C. , and A. H. , A geometric approach to find nondominated policies to imprecise reward MDPs, Proc. 8th International Conference on Computing & Communication Technologies-Research, Innovation, and Vision for the Future (IEEE RIVF 2011), pp.439-454, 2011.

J. Fürnkranz, E. Hüllermeier, W. Cheng, and S. Park, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, vol.28, issue.1???2, pp.123-156, 2012.
DOI : 10.1016/S0004-3702(01)00110-2

K. Judah, S. Roy, A. Fern, and T. G. Dietterich, Reinforcement learning via practice and critique advice, Proc. 24th AAAI Conference on Artificial Intelligence, pp.481-486, 2010.

R. Maclin and J. W. Shavlik, Creating advice-taking reinforcement learners, Machine Learning, vol.22, issue.1-3, pp.251-281, 1996.
DOI : 10.1007/bf00114730

A. Y. Ng, R. , and S. J. , Algorithms for inverse reinforcement learning, Proc. 17th Interantional Conference on Machine Learning, pp.663-670, 2000.

B. Piot, M. Geist, and O. Pietquin, Learning from Demonstrations: Is It Worth Estimating a Reward Function?, Proc. 1st Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2013, pp.17-32, 2013.
DOI : 10.1007/978-3-642-40988-2_2
URL : https://hal.archives-ouvertes.fr/hal-00869801

M. Puterman, Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and statistics, 2005.
DOI : 10.1002/9780470316887

K. Regan and C. Boutilier, Regret-based reward elicitation for Markov decision processes, Proc. 25th Conference on Uncertainty in Artificial Intelligence, pp.444-451, 2009.

K. Regan and C. Boutilier, Robust policy computation in reward-uncertain MDPs using nondominated policies, Proc. 24th AAAI Conference on Artificial Intelligence, pp.1127-1133, 2010.

K. Regan and C. Boutilier, Robust online optimization of reward-uncertain MDPs, Proc. 22nd International Joint Conference on Artificial Intelligence, p.2165, 2011.

P. Weng and B. Zanuttini, Interactive value iteration for Markov decision processes with unknown rewards, Proc. 23rd International Joint Conference on Artificial Intelligence, pp.2415-2421, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00942290

P. Weng, Markov decision processes with ordinal rewards: reference point-based preferences, Proc. 21st International Conference on International Conference on Automated Planning and Scheduling, pp.282-289, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01285812

P. Weng, Ordinal decision models for Markov decision processes, Proc. 20th European Conference on Artificial Intelligence, pp.828-833, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01273056

H. Xu and S. Mannor, Parametric regret in uncertain Markov decision processes, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pp.3606-3613, 2009.
DOI : 10.1109/CDC.2009.5400796