Head Mounted Display (HMD) became a popular device, drastically increasing the usage of Virtual, Mixed, and Augmented Reality. While the systems’ visual resources are accurate and immersive, precise interfaces require depth cameras or special joysticks, requiring either complex devices or not following the natural body expression. This work presents an approach for the usage of bare hands to control an immersive game from an egocentric perspective and built from a proposed case study methodology. We used a DenseNet Convolutional Neural Network (CNN) architecture to perform the recognition in real-time, from both indoor and outdoor environments, not requiring any image segmentation process. Our research also generated a vocabulary, considering users’ preferences, seeking a set of natural and comfortable hand poses and evaluated users’ satisfaction and performance for an entertainment setup. Our recognition model achieved an accuracy of 97.89%. The user’s studies show that our method outperforms the classical controllers in regards to natural interactions. We demonstrate our results using commercial low-end HMD’s and compare our solution with state-of-the-art methods.