Hello,
I ran two experiments. Both showed variations that seemed large but had too high a p-value to be considered statistically significant.
For future reference, I (and probably others) would like to know:
What is the recommended number of users to run through an experimental variant in order to give yourself a good chance of producing a statistically significant result?
There are some assumptions in this question that I would like to make explicit:
- The change in a variable will either be 0 or at least 20%.
- We only care about one variable.
Facebook, for instance, gives you a guess of your chance of generating a statistically significant outcome when you are setting up an A/B test in an ad campaign. You can pay more for a better chance or less for a lower chance. I'm not asking for a feature like that. I'm just hoping someone has some guidance that can help me build better experiments.
An alternative form of this question that would be equally informative in the same way is this:
How did PlayFab engineers and product team members envision experiments being used? e.g.,
- Expected volume of players
- Anticipated experiment length
- Potentially destabilizing external forces
Any information would definitely help me build a business using a Lean Startup approach along with your platform and will probably help a lot of other engineers.
Thanks,
Max