My answer is not based on research, it is how I think of simulating tactics in a small scope.
This is how I made my simulation:
The first thing that you need to do is making sure you can implement the tactics that you want to compare. If there is no way you can program them, then there is no way you could test them on your own.
Write the test scope down, you could expand your project every time you have a new idea. When you do that you are creating a project that never will finish. So mark what you want to do and consider every option. Bluffing is a good example. It's almost impossible to implement a bluff simulation, you could calculate how much chance you have to win, but a normal person bluffs for example when he thinks that other persons will pass or that he will win. It is a lot of work to implement “feelings” in a program. Not every one has the same feelings about bluffing. But you could make different bluff sets (feelings) and let the bot use them with every tactic. This will be a lot of work, in my case this was not worth it.
Find out how you want to compare the tactics. Do you want to know how much % change you have to win? Or do you want to do some random games for some other reason?
There are more dialects of every game, find out what set of rules you want to follow. Implement those rules. The simulation will be incorrect when you don't do this.
Implement all tactics, and use the same functions as much as possible. This way you're testing the core functions automatically. With this approach you can be done sooner and you will find bugs that you never would been found.
Test the system. Testing is a very important with simulating. One small problem can cause a wrong result. Walk over your code with a debugger to know if it does the ride thing. Make a "debug mode", show in this mode the results of every game and review them. Also let other persons review how the games are played.