Reinforcement Learning Bomberman