Peacemaker
The API deployment is built with a Dockerfile and deployed using Nomad. Jenkins mainly orchistrates our deployments using the API's provided by the Hashicorp stack.
Current deployment strategy
Any merges into staging should trigger a job on jenkins.wealthfit.com that will deploy the new changes. Similarly, any merges in master will perform the same under production-jenkins.wealthfit.com.
What is Truthseeker
In the root of the peacemaker repository lives two nomad template files, one for each environment. These templates are to be used with consul-template to authenticate and pull k/v & secrets from consul and vault. To help aid with this process, there exists a utility library called Truthseeker that does three things:
- Building Terraform templates & Nomad jobs via consul-template for securely grabbing keys and secrets from Vault & Consul
- Helper methods to update Consul's K/V store & run Nomad jobs.
- TruthKeeper, a GenServer which is used as a data store to help determine the next available load balancer and UI ports to run Fabio with.
What does this mean?
1a) Previously we had a feature where we had multi-branch deployments. Any PR that had development as the base branch had AWS resources configured to create a seperate environment with the PR's code deployed. The resources were configured using terraform. The modules can be found in the repo Pathfinder, though this feature is currently disabled.
Register the PR app under fabio so that we can set a target group to be used by the ALB in AWS. (also deprecated)
In order to achieve rolling releases for minimal downtime during deployments, we generate random strings in Consul's K/V store that will be set as the
snamefor each of the Elixir nodes. This ensures that different API versions don't cluster together since we use canary deployments as our strategy (eg: Peacemaker v1 only clusters with other Peacemaker v1 nodes).Fetching secrets from vault. The nomad template contains only the path to where the secret lives in vault. Truthseeker helps authenticate against vault and fetch the specific secrets using consul-template under the hood.
Troubleshooting
Nomad Logs can be found using the
nomadbin. ie:nomad job status peacemaker ID = peacemaker Name = peacemaker Submit Date = 2022-02-10T12:43:41-08:00 Type = service Priority = 50 Datacenters = us-west-2a,us-west-2b,us-west-2c Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost peacemaker-api 0 0 3 0 18 0 Latest Deployment ID = 2c1e9d49 Status = successful Description = Deployment completed successfully Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline peacemaker-api true true 3 3 3 3 0 2022-02-10T20:54:32Z Allocations ID Node ID Task Group Version Desired Status Created Modified 1fc09ca7 298c4ff9 peacemaker-api 336 run running 7d2h ago 7d2h ago 9d0fa8e1 03b9097f peacemaker-api 336 run running 7d2h ago 38m28s ago c527dadb 46b2652b peacemaker-api 336 run running 7d2h ago 7d2h ago ubuntu@ip-69-0-2-37:~$ nomad alloc logs 9d0fa8e1nomad alloc logs <allocation-id> ex: Server: peacemaker.wealthfit.com:80 (http) Request: PUT /api/account/password ** (exit) an exception was raised: ** (RuntimeError) cannot encode association :allowed_courses from Peacemaker.Account to JSON because the association was not loaded. You can either preload the association: Repo.preload(Peacemaker.Account, :allowed_courses) Or choose to not encode the association when converting the struct to JSON by explicitly listing the JSON fields in your schema: defmodule Peacemaker.Account do # ... @derive {Jason.Encoder, only: [:name, :title, ...]} schema ... do (ecto 3.5.8) lib/ecto/json.ex:4: Jason.Encoder.Ecto.Association.NotLoaded.encode/2 (peacemaker 3.3.5-rc.5) lib/peacemaker/accounts/account.ex:65: Jason.Encoder.Peacemaker.Account.encode/2 (jason 1.2.2) lib/encode.ex:172: Jason.Encode.map_naive/3 (jason 1.2.2) lib/encode.ex:35: Jason.Encode.encode/2 (jason 1.2.2) lib/jason.ex:197: Jason.encode_to_iodata!/2 (phoenix 1.5.9) lib/phoenix/controller.ex:776: Phoenix.Controller.render_and_send/4 (peacemaker 3.3.5-rc.5) lib/peacemaker_web/controllers/api/account_controller.ex:1: PeacemakerWeb.AccountController.action/2 (peacemaker 3.3.5-rc.5) lib/peacemaker_web/controllers/api/account_controller.ex:1: PeacemakerWeb.AccountController.phoenix_controller_pipeline/2 (phoenix 1.5.9) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2 (peacemaker 3.3.5-rc.5) lib/peacemaker_web/endpoint.ex:1: PeacemakerWeb.Endpoint.plug_builder_call/2 (peacemaker 3.3.5-rc.5) lib/peacemaker_web/endpoint.ex:3: anonymous fn/3 in PeacemakerWeb.Endpoint."call (overridable 3)"/2 (appsignal 2.1.9) lib/appsignal/instrumentation.ex:10: Appsignal.Instrumentation.instrument/1 (peacemaker 3.3.5-rc.5) lib/peacemaker_web/endpoint.ex:1: PeacemakerWeb.Endpoint."call (overridable 4)"/2 (peacemaker 3.3.5-rc.5) lib/plug/error_handler.ex:65: PeacemakerWeb.Endpoint.call/2 (phoenix 1.5.9) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4 (cowboy 2.9.0) /opt/app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2 (cowboy 2.9.0) /opt/app/deps/cowboy/src/cowboy_stream_h.erl:306: :cowboy_stream_h.execute/3 (cowboy 2.9.0) /opt/app/deps/cowboy/src/cowboy_stream_h.erl:295: :cowboy_stream_h.request_process/3 (stdlib 3.15.2) proc_lib.erl:226: :proc_lib.init_p_do_apply/3- Nomad UI is also available via
tunnel-[staging,production]-nomad-[a-c], or Checkout here for the under-the-hood methods.
- Nomad UI is also available via
Errors
Tips & Tricks
This script pings the /_internal/version endpoint for the Peacemaker API every second. This is helpful during deployments to ensure that the new version is deployed. You should also be able to watch the version slowly roll over, like demonstrated here
while true; do curl -k https://peacemaker.wealthfit.com/_internal/version; sleep 1; doneAdding Secrets:
There exists a tool in the design-systems repo under the wf npm run script that can be used to add secrets to vault. Otherwise you can manually open a SSH tunnel (via tunnel-[staging,production]-vault-[a-c]) to any of the vault nodes on port 8200. (Note: SSL isn't setup around vault due to time constraints, so make sure to access the vault under http).
The secret format looks something like the following:
{{with secret "secret/mux"}}{{.Data.access_token_id}}{{end}}which can be seen as the following:
{{with secret "[path_to_]/[vault_secret]"}}{{.[key].[value]}}{{end}}
``WARNING: UNPROTECTED PRIVATE KEY FILE!
Fix: chmod 400 ~/.ssh/private_key_file_here.pem stackoverflow reference
Manual Deployments (was written in 2018 but still technically how things work under the hood). Some items may be outdated.
Prequisites
Configure SSH
Copy the following into `~/.ssh/config`
``` Host pathfinder-staging-bastion HostName ec2-18-205-194-8.compute-1.amazonaws.com User ec2-user Port 22 IdentityFile ~/.ssh/wealthfit-staging-pathfinder.pem ForwardAgent yes GSSAPIAuthentication no PasswordAuthentication no ChallengeResponseAuthentication no StrictHostKeyChecking no UserKnownHostsFile=/dev/null GatewayPorts yes Host peacemaker-staging-a ForwardAgent yes UserKnownHostsFile=/dev/null GatewayPorts yes User ubuntu Port 22 ProxyCommand ssh pathfinder-staging-bastion nc 10.1.1.233 22 IdentityFile ~/.ssh/wealthfit-staging-pathfinder.pem StrictHostKeyChecking no Host peacemaker-staging-b ForwardAgent yes User ubuntu Port 22 ProxyCommand ssh pathfinder-staging-bastion nc 10.1.2.11 22 IdentityFile ~/.ssh/wealthfit-staging-pathfinder.pem StrictHostKeyChecking no Host pathfinder-staging-consul-a ForwardAgent yes UserKnownHostsFile=/dev/null GatewayPorts yes User ubuntu Port 22 ProxyCommand ssh pathfinder-staging-bastion nc 10.1.1.126 22 IdentityFile ~/.ssh/wealthfit-staging-pathfinder.pem StrictHostKeyChecking no Host pathfinder-staging-consul-b ForwardAgent yes UserKnownHostsFile=/dev/null GatewayPorts yes User ubuntu Port 22 ProxyCommand ssh pathfinder-staging-bastion nc 10.1.2.204 22 IdentityFile ~/.ssh/wealthfit-staging-pathfinder.pem StrictHostKeyChecking no Host pathfinder-staging-nomad-a UserKnownHostsFile=/dev/null GatewayPorts yes ForwardAgent yes User ubuntu Port 22 ProxyCommand ssh pathfinder-staging-bastion nc 10.1.1.143 22 IdentityFile ~/.ssh/wealthfit-staging-pathfinder.pem StrictHostKeyChecking no Host pathfinder-staging-nomad-b ForwardAgent yes UserKnownHostsFile=/dev/null GatewayPorts yes User ubuntu Port 22 ProxyCommand ssh pathfinder-staging-bastion nc 10.1.2.129 22 IdentityFile ~/.ssh/wealthfit-staging-pathfinder.pem StrictHostKeyChecking no ```Caveats
Note: You may change the HostName value to whatever you want. This will be the name used when you ssh into the container (eg: ssh peacemaker-staging-a).
It's important that the private IP addresses are correct. During development, containers may be destroyed/recreated with new private IP addresses.
This command will open a tunnel, forwarding any requests on pathfinder-staging-nomad-a:4646 to your localhost:4646 so when you open localhost:4646 in the browser, we should be able to view the Nomad Web UI.
Run this command in your terminal: ssh -L 4646:localhost:4646 pathfinder-staging-nomad-a -N
Running API Job
cdto the root of the projectdocker build -t wealthfit/peacemaker:0.0.0 .- This will create a docker image named
wealthfit/peacemakerwith the tag of0.0.0. Versioning control processes will be addressed in the near future. For the time being, let's prevent bumping this version tag. You can still create other tags (eg:0.0.0-test,test,foo)
- This will create a docker image named
- Validate that the docker build successfully runs
docker run wealthfit/peacemaker:0.0.0- secrets.prod.exs will expect a
DATABASE_URLenvironment variable. The defaulted DATABASE_URL in the Dockerfile points to our production RDS. This means if we run our docker image locally without editing the DATABASE_URL, the image will fail when running migrations.,
- secrets.prod.exs will expect a
- Update
api.nomadto use the new docker image tag. - Validate that
nomad job plan api.nomadis what is expected. nomad job run api.nomad- If you view the Nomad Web UI, you should be able to see the job running. Any logs / debugging can be done through the web UI upon failure.
Future Notes
I am hoping... that this document will eventually be deprecated once the CI/CD pipeline is complete. The ideal workflow is to have developers just push code, and let the infrastructure handle the rest. This process is a bit convoluted right now, and I am planning to simplify this with automation pipelines. I don't foresee this process scaling well because of potential security risks.