Sampling based approaches for minimizing regret in uncertain Markov decision processes (MDPs), J. Artificial Intelligence Research, vol.59, pp.229-264, 2017. ,
Solving MDPs with unknown rewards using nondominated vectorvalued functions, Proc. 8th European Starting AI Researcher Symposium, pp.15-26, 2016. ,
Approximate regret based elicitation in Markov decision process, The 2015 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for Future (RIVF), pp.47-52, 2015. ,
DOI : 10.1109/RIVF.2015.7049873
Strategic advice provision in repeated human-agent interactions, Autonomous Agents and Multi-Agent Systems, vol.5, issue.4, pp.4-29, 2016. ,
DOI : 10.1080/01621459.1949.10483310
URL : http://www.cs.biu.ac.il/%7Esarit/data/articles/repeated.pdf
Regret in Decision Making under Uncertainty, Operations Research, vol.30, issue.5, pp.961-981, 1982. ,
DOI : 10.1287/opre.30.5.961
Partitioning procedures for solving mixed-variables programming problems, Numerische Mathematik, vol.38, issue.1, pp.238-252, 1962. ,
DOI : 10.1007/BF01386316
Preference-based evolutionary direct policy search, ICRA Workshop on Autonomous Learning, 2013. ,
DOI : 10.1007/s10994-014-5458-8
URL : https://hal.archives-ouvertes.fr/hal-01216088
Policy shaping with human teachers, Proc. 24th International Joint Conference on Artificial Intelligence (IJ- CAI 2015), pp.3366-3372, 2015. ,
A geometric approach to find nondominated policies to imprecise reward MDPs, Proc. 8th International Conference on Computing & Communication Technologies-Research, Innovation, and Vision for the Future (IEEE RIVF 2011), pp.439-454, 2011. ,
Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, vol.28, issue.1???2, pp.123-156, 2012. ,
DOI : 10.1016/S0004-3702(01)00110-2
Reinforcement learning via practice and critique advice, Proc. 24th AAAI Conference on Artificial Intelligence, pp.481-486, 2010. ,
Creating advice-taking reinforcement learners, Machine Learning, vol.22, issue.1-3, pp.251-281, 1996. ,
DOI : 10.1007/bf00114730
Algorithms for inverse reinforcement learning, Proc. 17th Interantional Conference on Machine Learning, pp.663-670, 2000. ,
Learning from Demonstrations: Is It Worth Estimating a Reward Function?, Proc. 1st Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2013, pp.17-32, 2013. ,
DOI : 10.1007/978-3-642-40988-2_2
URL : https://hal.archives-ouvertes.fr/hal-00869801
Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and statistics, 2005. ,
DOI : 10.1002/9780470316887
Regret-based reward elicitation for Markov decision processes, Proc. 25th Conference on Uncertainty in Artificial Intelligence, pp.444-451, 2009. ,
Robust policy computation in reward-uncertain MDPs using nondominated policies, Proc. 24th AAAI Conference on Artificial Intelligence, pp.1127-1133, 2010. ,
Robust online optimization of reward-uncertain MDPs, Proc. 22nd International Joint Conference on Artificial Intelligence, p.2165, 2011. ,
Interactive value iteration for Markov decision processes with unknown rewards, Proc. 23rd International Joint Conference on Artificial Intelligence, pp.2415-2421, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00942290
Markov decision processes with ordinal rewards: reference point-based preferences, Proc. 21st International Conference on International Conference on Automated Planning and Scheduling, pp.282-289, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01285812
Ordinal decision models for Markov decision processes, Proc. 20th European Conference on Artificial Intelligence, pp.828-833, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-01273056
Parametric regret in uncertain Markov decision processes, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pp.3606-3613, 2009. ,
DOI : 10.1109/CDC.2009.5400796