For the past two years, I've been helping the network operations team at my current company with developing a network automation skillset. I started this journey during my bachelor's thesis, and based on the research I did we've been choosing what languages to learn and what tools to use. This post contains my notes on the various tools I've looked at; I hope it might help you plot a course on your own journey towards modern day network operations.

Getting started

First and foremost: get your team to learn some Python. Even if you're going to use an orchestration platform, knowing a little Python can really help. Good place to start: the free email course Python for Network Engineers; we did this as a team, so we could motivate each other and compare solutions. Just pointing people to this link is generally not enough.

Once you know a little Python, you're going to start getting annoyed by all the screenscraping and RegEx, so it's time to look at some nice modules to help you:

  • Netmiko - A library that builds on the standard Paramiko library to simplify SSH connections to network devices. This library can deal with all sorts of prompts, enable passwords, etcetera.
  • NAPALM - a cross-platform uniform API for network devices. You can use this library to perform actions like configuration replace and rollbacks, and gather structured data from Cisco, Juniper and Arista devices.
  • Jinja2 - an easy to use templating language, that you can use to build and render configuration templates for your devices. It can get really ugly when you try to add too much logic, but in many cases this is a great solution to get started with infrastructure as code.

Orchestration tools

It's important to understand the basics, but there are two major problems with doing everything yourself using Python:

  1. How do you handle your inventory?
  2. Sequential loop over hundreds (or even dozens) of devices takes way too long.

This is where orchestration tools come in.

Ansible

Ansible is by far the most widely used tool in network automation. This means there is lots of information available, so it's easy to get started. Ansible is part of Red Hat (now IBM) so there is big money behind its development.
Cisco has a good tutorial on Ansible and Juniper has a great Day One book: Automating Junos with Ansbile.

Salt

Since late 2016 you can manage networking equipment using Salt, and it's been gaining some traction within the networking community. The nice thing about Salt is that you can have it react on events, rather that run only user-initiated jobs like Ansible. The drawback is that it relies on NAPALM, so it's limited to the types of devices that NAPALM supports.
If you want to get started with Salt, check this tutorial by Mircea Ulinic, and the free ebook Network automation at scale over at Cloudflare.

Nornir

Nornir is a new entry into this space. It's a Python library that solves the two problems I mentioned at the top of this section. An important advantage over Ansible and Salt is that you're writing Python as opposed to a domain-specific language (YAML DSL in both cases); this makes it far easier to add logic and (unit-)tests to your code, and debug it when it breaks.
There's a good introduction over at NetworkLore, and the Nornir documentation is quite readable too.

So which one to choose?

You're first order of business is to hop over to the server management team. Chances are they are already using an orchestration tool, and if they are using Ansible or Salt it would be a shame not to use their knowledge. Even if you don't end up merging workflows and using the same back-end systems.

Similarly, if there's an in-house software development team using Pyhon, or if your team is already well-versed in programming, Nornir might be the best way to go. It's probably the most versatile and extensible tool in that case.

Keep in mind though that both Salt and Nornir use NAPALM, and thus are limited to NAPALM's list of supported devices. Ansible supports a lot more network devices. If you want to manage lots of non-Cisco/Juniper/Arista gear, you're probably better off with Ansible.

If after all this you still don't know which tool is best for you, go with Ansible. It's the safe choice, nearly everyone is using it and there is much information on the internet to get you started. And the tool you use to automate stuff is always better than the perfect tool you don't use.

What to do first?

Where do you focus on when starting with network automation? Obvious place is to start where you can save time (repetitive work: toil), but it's often hard to show business value in that. You need to spend a lot of effort measuring things, and you're not going to fire some staff because you work faster, do you?

I prefer to borrow from LEAN manufacturing, and focus on QCD. Things that matter to our customers are quality, cost, and delivery (speed). The lesson from manufacturing is that it's always the best to start with a focus on quality: make sure you do things "First Time Right" to prevent rework, include all required configuration in monitoring, security, and logging systems, etcetera. This will increase speed of delivery (less rework) and subsequently reduce cost. And makes your customers happy.

This dovetails nicely with the new trend of Network Reliability Engineering.

What to do next?

Once you start automating, you can put all your logic in git repositories. Here is where you can really improve your processes. Take a look at the GitHub Flow to embed peer review by default. GitLab CI for automated validation/testing is also very nice

You'll also will notice that handling passwords and secrets programmatically is hard, especially when you're working in a team and it's not just your credentials on your device that you need to keep save. Check out HashiCorp Vault for a means to start solving this.

Do you want to know even more, or more detail? Read Network Programmability and Automation by Scott Lowe, Jason Edelman, and Matt Oswalt. This book is written by some of the top experts in the field, and details nearly everything I mentioned.