A new colleague needed some help to setup a Graylog installation. He had never done this before, so he asked for assistance. What follows is a rehash of an email I sent him on how to proceed and build knowledge on the subject:
So initially I had zero knowledge of Graylog. What I did to accustom myself with it was to download an OVA file with a prepared virtual machine and run it via VMware Fusion. The same VM can also be imported to VirtualBox and even to AWS, although they also provide ready AMIs for deployment in AWS. Links:
– AWS AMIs
Keep in mind that this is a full installation of what Graylog needs to work with and it also comes with a handly little script named “graylog-ctl” that manipulates a lot of configuration. The big catch is that graylog-ctl is not part of any standard Graylog deployment. It only comes with the OVA and the AMI images.
So after I had some fun with it on a VM on my workstation, reading the documentation and testing stuff, I had an initial deployment of the AMI image in AWS. But this is not an installation that can scale. Which brings us to the next steps:
- For Graylog to work you need to provide it with a MongoDB and an ElasticSearch database. It is your choice whether these will be clustered for high availability or not, whether they will run in the same machine or not. You control the complete architecture. So in my case I made the following decisions:
- I am running a MongoDB replica set using three VMs. This is a standard setup as it is described in the MongoDB online documentation. Since it is not password protected, it only accepts connections from the Graylog instance. I used AWS security groups for that.
- I am using an ElasticSearch cluster with three VMs where the nodes are both data and masters. If you can, use 7 nodes, three masters (lower machines since they do not run queries and do not index any data) and four data nodes (higher end machines). Again, since this is not password protected, I used AWS security groups to allow access only from the Graylog instance.
- I am running a single Graylog instance on a separate VM. Currently it only listens for syslog stuff. When the need arises, I will add a two more nodes to increase the availability. I think I changed as many as four or five lines in the main configuration file. Graylog uses MongoDB to store its configuration, which includes anything you configure via the web interface.
- Pay extra attention to the versions of ElasticSearch and MongoDB that your Graylog version requires. Use exactly what is mentioned in the documentation. For example in my case I am not running ES 6.x but the latest 5.x.
Now it is time to up your game. Once you see that your installation is working you have to decide whether to password protect access to MongoDB and ElasticSearch and whether to encrypt traffic between all those instances or not. I say give it a go.
I’ve not even touched issues like database management for Mongo and Elastic, backing them up, restoring, deleting indices, etc because this is post from zero to your first week testing Graylog. There is plenty of stuff out there to take you to the next level, once you get used to the complexity of the software involved.
Should you need any more help, ping me anytime.