Gone are the days when those of us not fluent in SQL found ourselves perched on dev.mysql.com seeking to verify just one more time that the syntax of a pages-long command to import a sensitive database that we were trusted with is precisely correct. Only to get an error message for a missing ";" or something. Oh, it was just me who did that?!
Well, never mind. Anyway, suppose you couldn't tell by that overly dramatic introductory sentence. In that case, database administration (DBA) is not my favorite activity - not because it's terrible, but because it was intimidating for a while at least.
Fortunately, as mentioned in the first sentence, many of the more daunting components of DBA have either been automated or solved in other, more accessible ways, thanks to Developers Like You. Whether out of need, frustration, or the kindness of their hearts, they have created a suite of open source tools that solve a lot of these problems.
Excellent question. Let’s talk about it a bit. So, if you’ve been in the tech space for a while, you’ve most likely heard and/or used a plethora of seemingly ever-evolving descriptors for the different specialties that fall under the DevOps umbrella, including (but not limited to):
For the sake of this article, we’re going to focus on GitOps (but if you’d like a dive into other concepts, ping us on K8s Slack). GitOps - using Git as a single source of truth for declarative infrastructure & applications - is a Cloud Native type of Infrastructure as Code, as it builds on & combines some critical components of IaC, including (but once again, not limited to):
The main difference is that GitOps is tailored explicitly for containerized environments. One of the key benefits of applying GitOps best practices is that, while it enables the implementation of automated CI/CD pipelines, there’s always a “single source of truth” at the infrastructure & application level. This is thanks to a myriad of available tools to compare the actual production state of the environment in question with what’s under source control, which can be set to alert when a significant mismatch occurs.
Now, what if I told you that the above concept could be applied to database schema migration?
DB schema migration scripts may have looked something like this back in The Before Times. Walls of code, usually not nearly as well documented as the example, that you hoped would perform the needed task promptly and with as little data loss as possible. Even if said script worked perfectly, writing was still a pain. It’s no wonder that the mere mention of database applications in Kubernetes triggered a fight-or-flight response in some (some, being me) as adding a containerization layer to an already daunting task could leave one feeling a bit overwhelmed.
Upon further inspection, we learned that the exact opposite is true. Several tools are available that make the process easier to understand & execute while also providing a crucial verification layer - a cornerstone of GitOps best practices - that allows the administrator to double or triple check to be sure that what’s happening is what’s intended.
One such tool is SchemaHero, a Kubernetes operator-slash-CLI utility that converts DB schema definitions into migration scripts. Instead of writing a series of scripts in Python/Go/Ruby/XML or whatever your preferred language, Kubernetes administrators can instead create, import, and manage containerized databases using the same YAML definitions used for virtually every other component of a Kubernetes cluster.
What they look like now:
[.pre]apiVersion: schemas.schemahero.io/v1alpha4kind: Tablemetadata:name: airportnamespace: schemahero-tutorialspec:database: airlinedbname: airportschema:postgres:primaryKey: [code]columns:- name: codetype: char(4)- name: nametype: varchar(255)constraints:notNull: true[.pre]
We will start easy by going through the onboarding tutorial from the SchemaHero docs that’ll walk us through building a PostgreSQL database using SchemaHero. To follow along, you’ll need a running Kubernetes cluster. I’m using one of my trusty NUCs running Ubuntu 20.04.04 LTS bc why not, but if you don’t have a surplus of mini servers available for playing around, you can quickly deploy a dev cluster using kURL, K3s, or even MiniKube if you’re feeling ambitious.
Once the cluster runs, we will need to install Krew, a kubectl plugin manager that acts as a repository for other kubectl plugins. The following command will download & install the krew binary in `$PATH`:
[.pre](set -x; cd "$(mktemp -d)" &&OS="$(uname | tr '[:upper:]' '[:lower:]')" &&ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&KREW="krew-${OS}_${ARCH}" &&curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&tar zxvf "${KREW}.tar.gz" &&./"${KREW}" install krew)[.pre]
Once installation is complete, update `~/.bashrc` to add krew to the environment variable, then reload your shell.
[.pre]~]% export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"~]% exec bash[.pre]
Now you should be able to access the krew command using kubectl. To be sure, let’s run a test command by listing out all plugins currently installed using krew:
[.pre]trilliams@replicated:~$ kubectl krew listPLUGINVERSIONkrewv0.4.3[.pre]
If you get an error message saying that krew isn’t installed, check `~/.bashrc` to verify that `$HOME/.krew/bin` was added to your path, then reload your shell.
Once Krew is up and running, we’re can move on to installing the SchemaHero plugin which is literally a single command that adds the kubectl-schemahero binary to $PATH:
[.pre]trilliams@replicated:~$ kubectl krew install schemaheroUpdated the local copy of plugin index.Installing plugin: schemaheroInstalled plugin: schemahero…[.pre]
Once installation is complete, you may get a scary-looking error message similar to this:
[.pre]WARNING: You installed plugin "schemahero" from the krew-index plugin repository. These plugins are not audited for security by the Krew maintainers. Run them at your own risk.[.pre]
Rest assured that Schemahero is a safe, secure, and verified plugin that works really well and is currently in the CNCF sandbox. There will also be a message pointing to official docs and advising on the next steps - adding SchemaHero components to the cluster. This is done with the command kubectl schemahero install, as shown below:
[.pre]trilliams@replicated ~ % kubectl schemahero installThe SchemaHero operator has been installed to the clustertrilliams@replicated ~ % kubectl get nsNAMESTATUS AGEdefaultActive 7m12skube-node-lease Active 7m13skube-public Active 7m13skube-system Active 7m13sschemahero-system Active6s[.pre]
This command will kick off the installation of the SchemaHero in-cluster operator under the `schemahero-system` namespace. Depending on your clusters’ available resources, this step should take anywhere from a few seconds to a few minutes. Once the `schemahero-0` operator is in Running status, we’re ready to jump into the fun part of creating a database to connect to SchemaHero.
[.pre]trilliams@replicated ~ % kubectl get pods -n schemahero-systemNAME READY STATUSRESTARTS AGEschemahero-0 1/1 Running 045s[.pre]
Before we can connect a database, we'll first need to create one. Fortunately, this has been simplified for the tutorial's sake by a handy YAML that will generate a Postgres server available in the SchemaHero GitHub repository. So let's create a new namespace named schemahero-tutorial for the Postgres instance to live under:
[.pre]~ % kubectl create ns schemahero-tutorialnamespace/schemahero-tutorial created[.pre]
Once the namespace is created, use the example config referenced above to create the Postgres instance:
[.pre]~ % kubectl apply -n schemahero-tutorial -f https://raw.githubusercontent.com/schemahero/schemahero/main/examples/tutorial/postgresql/postgresql-11.8.0.yamltrilliams@replicated:~$ kubectl get po --namespace=schemahero-tutorialNAME READY STATUSRESTARTS AGEpostgresql-0 1/1 Running 0143m[.pre]
Once the `postgresql-0` container is Running, our next step is to provide DB information the SchemaHero operator created in a previous step so that it can manage the database by deploying a custom resource to the cluster that contains connection information. Now, that sounds like a lot, but it's pretty straightforward. We're going to create a new YAML database definition named `airline-db.yaml` that will set up a few needed parameters for interaction with SchemaHero.
This definition will establish the object's group, version, and kind, set a name for the database that other SchemaHero objects will use, and set the namespace where SchemaHero is to watch for schemas and credentials needed to connect to and authenticate to the database under spec.
[.pre]apiVersion: databases.schemahero.io/v1alpha4kind: Databasemetadata:name: airlinedbnamespace: schemahero-tutorialspec:connection:postgres:uri:valueFrom:secretKeyRef:name: postgresqlkey: uri[.pre]
The definition is added to the cluster with the command:
[.pre]~] % kubectl apply -f ./airline-db.yaml[.pre]
We can verify that the object has deployed properly with the following command:
[.pre]trilliams@replicated:~$ kubectl get databases -n schemahero-tutorialNAMEAGEairlinedb 2h[.pre]
Next up, we’ll need to create a new table. Now, if we weren’t using SchemaHero, a command creating a new table would look something like this:
[.pre]> CREATE TABLE airport; ( code char(4) not null primary key, name varchar(255) )[.pre]
Let’s not do that. Instead, let’s create a new file named `airport-table.yaml` containing the following:
[.pre]apiVersion: schemas.schemahero.io/v1alpha4kind: Tablemetadata:name: airportnamespace: schemahero-tutorialspec:database: airlinedbname: airportschema:postgres:primaryKey: [code]columns:- name: codetype: char(4)- name: nametype: varchar(255)constraints:notNull: true[.pre]
In this object, we’ll once again declare Group, Version, and Kind, then establish the name of our Table - in this case, it’s airport - and declare the namespace to deploy the Table object to. Under spec, we’ll declare database name, Postgres table name, & define table schema including keys, code, & type(s) compatible with Postgres.
Once you’ve saved the file, deploy the table object to SchemaHero by running the command:
[.pre]~] % kubectl apply -f ./airport-table.yaml[.pre]
Now we’re at the cool part where we get a crucial moment to pause and verify that DB schema information is correct before any damage has occurred. Let’s check out our pending migration with the command `kubectl schemahero` get migrations:
[.pre]~$ kubectl schemahero get migrations -n schemahero-tutorialID DATABASE TABLEPLANNEDEXECUTEDAPPROVEDREJECTEDeaa36efairlinedbairport 2h[.pre]
With this command, we get basics like UUID, DB name, DB table name, and how long ago the migration was planned. At this point, there’s not yet been an actual change to any DB schema. Let’s take a closer look at the scheduled migration with the command `kubectl schemahero describe`:
[.pre]trilliams@replicated:~$ kubectl schemahero describe migration eaa36ef -n schemahero-tutorialMigration Name: eaa36efGenerated DDL Statement (generated at 2022-03-21T11:49:00Z): create table "airport" ("code" character (4), "name" character varying (255) not null, primary key ("code"))…[.pre]
Under “Generated DDl Statement,” SchemaHero will print the *exact* SQL statement or statements run if the migration is approved. So let’s go ahead and approve the migration:
[.pre]trilliams@replicated:~$ kubectl schemahero -n schemahero-tutorial approve migration eaa36efMigration eaa36ef approved[.pre]
Since our DB schema is only one line, this should complete fairly quickly, which we can verify by running schemahero get migrations one more time:
[.pre]trilliams@replicated:~$ kubectl schemahero get migrations -n schemahero-tutorialID DATABASE TABLEPLANNEDEXECUTEDAPPROVEDREJECTEDeaa36efairlinedbairport3h 45s45s[.pre]
We see that migration ID eaa36f was approved and executed 45 seconds ago, taking no time to complete since our DB schema was only one line. If you so desire, you can verify the migration using open source tools like Beekeeper Studio or whichever DB management tool you prefer.
Once done, you can clean up your cluster by deleting the schemahero-tutorial namespace with the command kubectl delete namespace. This should get rid of any pods, services, definitions, etc. that we created while familiarizing ourselves with SchemaHero:
[.pre]trilliams@replicated:~$ kubectl delete namespace schemahero-tutorialnamespace "schemahero-tutorial" deleted[.pre]
If you want to take it a step further and completely remove SchemaHero from your system (though IDK why you would do that), delete the schemahero-system operator, then remove the kubectl-schemahero plugin from your system using krew:
[.pre]trilliams@replicated:~$ kubectl krew uninstall schemaheroUninstalled plugin: schemahero[.pre]
Or, if you want to continue on our journey of familiarizing ourselves with SchemaHero, keep the operator & plugin ready for the next edition of this blog, where we’ll venture out into uncharted territory to learn even more about how to be Heroes.
You made it to the end! Thanks for reading and following along! Meanwhile, please feel free to join our next Community Meeting on April 21st at 1230 PM Pacific/330PM Eastern.