Reinforcement Learning on bluf poker

Image generated using NanoBanana.

As a fun side project and to play around with reinforcement learning, I decided to make an agent that would learn to play “bluff poker”, a (as far as I could find) Dutch dice game where the objective is to always pass on a higher value then you recieved. Since you often dont get an actual higher value, you need to bluff and hope the next player beliefs you.

As far as I know its a game where no-one build an agent for, so I figured it would be fun. I play this game with a group once a month, and my goal was to make something that could play against the other (real) players.

So far I got the gymnasium envoirment set up, but the agent still does terrible actions. The repo can be found here: Github Repo