Best practices for designing and implementing effective workflows in software development.
In today's fast-paced software development environment, changes are being deployed on an hourly rate. To keep up with this rapid pace, it's essential to have well-defined automation that will notify you when changes occur, so you can always stay informed and respond quickly.
The best bounty hunters in the world use custom-developed automation. If you don't trust me, please checkout some of the references I found while following well known hackers in the bug bounty community:
In this article, we will go through some best practices for designing and implementing effective workflows, so you can maximize the chances of finding bugs.
Each section introduces an overall idea, followed by one or more bad and best practices.
Please, while reading, keep in mind the Unix philosophy while reading this article. If you are not familiar with it, Douglas McIlroy, the head of the Bell Labs Computing Sciences Research Center, summarized it as:
"Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."
The workflow is designed in such a way to enable everyone following these principles to easily implement, and maintain them.
Disclaimer: If you are stubborn like me, you won't take my words seriously unless I prove to you with examples why something is bad, and why something is good. If you are like that, you are in good hands, because the rest of this post will dive deeper into every best practice, and show you reasons why I designed the workflows in this way.
Let's get started!
This section goes hand in hand with the Unix philosophy.
Write programs that do one thing and do it well.
Bad practice: Combine everything in a single scan.
Best practice: Break down the scan into smaller, focused scans.
Let's start with the bad example, and break it down based on this principle.
Example (please, do not do this):
While you can certainly combine everything into a single scan, this approach comes with several drawbacks:
Now, imagine nmap fails. Even if you’ve uploaded the subdomain results
somewhere, outside of the platform, subsequent steps like running httpx,
performing directory scans, or launching nuclei won’t execute — the runner
can’t determine whether a step depends on the previous one.
Sure, you could use the allow_failure: true to continue even if nmap fails.
Since it is the leaf of that path (nothing depends on nmap), the
allow_failure would take care of the failure stopping the whole scan. But,
since it doesn't fail the scan (the rest of the scan continues to run), you
don't get notified that the scan failed, and you won't know about it unless you
manually inspect the successful run.
But consider httpx failing to find live domains. Since the httpx has steps
depending on it, how do we determine which step should be skipped?
Okay, you can't currently use the if expression to figure it out (although you
will soon be able to do that), but you can always create a workaround. In the
beginning of the step, you test if some file exists, letting you know if you
should even attempt to execute the step. But, and this is a big BUT, it makes
the step more complicated, and it suffers the same problem as before: you don't
know when the step fails.
As you can now clearly see, the workflow is powerful enough to allow you to do anything you want. But making it complicated just for the sake of making it work is not the right approach.
If you have any doubts about how to nicely design the workflow, you can always reach out on the discord server, or use any of the contact methods listed the support page.
Now, hopefully I convinced you that the monolithic scan is not the way to go. So, let's see how to do it the right way.
Example:
Let's walk through advantages:
subfinder, it could run at the same time as subfinder.subfinder fails, we can create an expression that
searches through previous successful runs, and use the last known good result.
That way, we run the liveness scan on the latest known good result.When designing steps within a scan, the same principle applies.
Bad practice: Combine multiple actions into a single step.
Best practice: Each step should perform a single, well-defined action.
Let's start from the bad example, and break it down based on this principle.
Example (bad practice):
While this doesn't seam as bad as the monolithic scan, it still has several drawbacks:
Example (best practice):
Here are the reasons why this approach is better:
::b-h-callout {variant="tip"}
Just be careful not to overdo it. As you can see, my single step runs the artifact download, and unzip. The reason is that I want this step to download something and prepare it. There are only few commands, and logically, it does the one thing (pulls down the artifact and prepares it for the next step).
::
When designing workflows, it's crucial to clearly define the inputs and outputs of each scan and step. This clarity helps in understanding the flow of data and dependencies between different parts of the workflow.
It is much easier to reason about the workflow when you can see what each scan produces, and what each scan consumes.
This tip goes hand in hand with the Unix philosophy: "Write programs to work together".
If you clearly know what each scan produces, and what each scan consumes, you can easily chain them together.
Bad practice: Rely on implicit data you assume exists on your machine.
Best practice: Use single-source-of-truth for data exchange between scans.
Let's say you have a scan that produces multiple artifacts, but in order to save costs, you decide to store them on your machine. Although this is certainly possible, imagine you introduce a new runner on a new machine. Let's also say that the next scan is scheduled on that machine now. The path doesn't exist anymore. And you forgot that the dependency of the second scan relies on the assumption that the artifact is present on the machine.
This becomes hard to debug, and hard to maintain.
Instead, every configuration parameter should be stored in the blob storage, each job should output the artifacts it produces, and every subsequent scan should download the artifacts it needs.
If you want to reduce the storage costs, you can:
expires_in parameter
in the artifact definition.What this snippet does is conditionally executes the step if there are at
least 2 successful runs of the subfinder scan. It then deletes the second
latest job, keeping only the latest successful run.
When designing workflows, it's important to consider the format of the data being exchanged between different steps. The purpose of automation is to enable machines to process data, so it makes sense to use formats that are easy for machines to parse and understand, and to allow you to easily manipulate the data.
As mentioned earlier, the Unix philosophy states:
Write programs to handle text streams, because that is a universal interface.
Use the text format that is easy to parse programmatically. I personally prefer
JSON, because of the powerful jq command-line JSON processor.
Bad practice: Use human-readable output formats for inter-step communication.
Best practice: Use machine-readable formats (e.g., JSON, CSV) for data exchange.
Let's start with an example. HTTPX can output data in a human-readable format:
While this format is easy for humans to read, it is not ideal for machines to parse.
For example, httpx -u bountyhub.org -cl -favicon -sc produces output like
this:
To parse the status code 200, you would have to use complex string manipulation techniques, which can be error-prone and hard to maintain.
For example, you might use awk or cut to extract the status code:
Now, imagine trying to filter these things out? Let's say you only want lines where the status code is 200. You would have to use more complex string manipulation:
Instead, you can just as easily do this:
And later use jq to filter the results:
When designing workflows, it's important to handle sensitive information and target-specific data appropriately. It is easy to hard-code such information directly into the workflow. With that said, let's walk through some best practices.
Bad practice: Hard-code sensitive and target-specific information directly into the workflow.
Best practice: Use secrets and vars to manage sensitive and target-specific data.
There are several reasons why you should use dynamic variables:
BOUNTYHUB_TOKEN. Instead of modifying the workflow, which
effectively restarts the workflow schedule, you can simply modify the project
secrets/variables and the next run will use the updated information.So, instead of specifying list of targets to scan for subfinder, like
following:
You should use vars:
Bad practice: Store vars in the format tools expect.
Best practice: Store vars in a generic format, and transform them as needed within the workflow.
Consider the scope variable as a case study, that I extensively use on every scan.
One approach we could take is to store the scope based on the way tool expects
it. The best example I could think of (although it is not that good) is using
csv format for the SCOPE var.
Now, let's say I want to take a look at that variable. I get CSV format, that is hard to read in the UI. I'd much rather see newline-separated list of domains, the way it is always presented. It is easier to read, easier to modify, and easy to transform.
Now, I when I add another tool, if it doesn't accept the list of domains (which
is rare), but rather a CSV format, I can just run
echo "{{ vars.SCOPE }}" | tr '\n' ',' | sed 's/,$//', and get the format I
need. Most tools work with lines, so it is also easier to trim down parts, take
only the top-level domains without having to split a single line into multiple
lines to do such processing.
Bad practice: Use vars for sensitive information, such as BOUNTYHUB_TOKEN
Best practice: Use secrets for sensitive information.
The difference between vars and secrets is that secrets are only retrieved by the runner at runtime, and are not visible in the UI.
By using secrets for sensitive information, you reduce the risk of accidental exposure. Not only that, but secrets are also masked in the logs, so if you accidentally print them, they won't be visible.
And since we the API of using secrets and vars is basically the same,
vars.VAR_NAME vs secrets.SECRET_NAME, there is no reason to use vars for
sensitive information.
Bad practice: Over-use of vars and secrets for non-target-specific information.
Best practice: Use vars and secrets only for information that is truly target-specific.
Using vars and secrets for non-target-specific information can lead to unnecessary complexity and make it harder to manage the workflow.
Let's say you want to apply the format information using the vars. You can
easily specify the variable, e.g. SUBFINDER_FORMAT and set it to -json
However, by changing the format in the project UI, you will most likely invalidate other scans or even other steps. Since format is an interface of the step/scan, changing scans/steps depending on that interface inevitably follows.
By clearly clearly separating Project concerns from the Workflow concerns, you can minimize the chance of workflows failing, making them more resilient.
When designing workflows, it's important to consider how the runner itself behaves. This also comes back to the "separation of concerns" principle.
The runner-based environment variables should be configured during runner installation, while workflow-based environment variables should be configured within the workflow.
Bad practice: Expose workflow-specific environment variables during runner installation.
Best practice: Use workflow envs to configure runner behavior.
Let's take the BOUNTYHUB_TOKEN as an example. This token has specific
permissions. You should follow the least-privilege principle, and only provide
the token with the permissions it needs for the specific workflow.
The way the runner works is it sources the environment variables during the startup, and sources the workflow-specific environment variables during the execution of the workflow.
Let's say, you export BOUNTYHUB_TOKEN="your_patv1" && runner run. Since you
don't know which permissions you need, you likely specify a token with broad
permissions, covering all possible scenarios. This violates the least-privilege
principle, leaving you exposed in case the token is leaked.
Another issue, let's say the token is about to expire. You then have to ssh into the machine, stop the runner, export the new token, and start the runner again. This is cumbersome, and error-prone.
Of course, you can set token to never expire, but I would argue that this is not a good security practice.
Instead, you configure the BOUNTYHUB_TOKEN secret on your project, and use the
env field, so you don't have to manually evaluate the expression on every step
that needs it. This brings up to the next practice.
Bad practice: Manually handle authentication and configuration for runner behavior.
Best practice: Use workflow envs to configure runner behavior.
Let's take a bh CLI as our case study.
Since bh CLI is designed to be used to talk to the BountyHub API, you will
likely heavily rely on it to download artifacts, blobs, dispatch scans etc.
Instead of manually handling authentication on every step, you can use the
BOUNTYHUB_TOKEN environment variable to authenticate requests.
Example (bad practice):
Example (best practice)
By using workflow envs to configure runner behavior, you can simplify your workflow and reduce duplication.
It is less secure in a way that steps that don't require access to the env
will be able to see the variable, but since you are completely in control of
what is being executed, the trade-off is worth it in my opinion.
When designing workflows, it's important to consider how artifacts are organized and managed. Artifacts are the interface between different scans, so make sure you design it nicely.
Bad practice: Combine all outputs into a single artifact.
Best practice: Separate artifacts based on their purpose and usage.
See an example from the previous blog post Deep dive into httpx.
In this example, we wanted to:
Now, if you want to download specific artifact, you can do it easily:
By separating artifacts based on their purpose, you:
If we were to combine multiple outputs into a single artifact, we would lose the ability to optimize storage and notification settings for different types of artifacts.
Not only that, but you would unnecessarily extract files that you don't really need, which can cause issues with your workflow down the line.
Bad practice: Schedule workflows without considering their execution time and frequency.
Best practice: Design cron schedules that align with the workflow's purpose and resource usage.
Let's take a simple example of:
If you run both at the first minute of the hour, you might end up with a situation where the liveness probe runs before the subdomain enumeration is complete. You then get the new result an hour later.
Instead, you are smart, so you know that subdomain enumeration takes around 10 minutes (just an example).
Then, your httpx cron job should likely run every 20th minute of the hour, so
you give enough time for the subdomain enumeration to complete.
By designing cron schedules that consider execution time and frequency, you can ensure that workflows run smoothly and efficiently without unnecessary delays or resource contention.
Bad practice: Over-scheduling workflows, leading to resource contention and potential failures.
Best practice: Schedule workflows based on their actual needs and resource usage.
Avoid scheduling workflows more frequently than necessary. For example, if a workflow only needs to run once a day, don't schedule it to run every hour.
This can lead to resource contention, increased costs, and potential failures due to overlapping executions.
Over-scheduling can lead to:
By scheduling workflows based on their actual needs, you can optimize resource usage and ensure that workflows run reliably.
For example, if you are scanning a target that doesn't change often, you can schedule the workflow to run less frequently, such as once every 2 days, or even once a week.
On the other hand, if you are scanning a target that changes frequently, you can schedule the workflow to run more frequently, such as every hour.
When designing workflows, it's important to consider edge cases and error handling.
Let's say your liveness probe runs, but the subdomain enumeration failed for some reason.
Bad practice: Assume that all dependencies will always succeed.
Best practice: Implement error handling and fallback mechanisms for edge cases.
In the liveness probe example, you can check if the subdomain enumeration scan is available before proceeding with the liveness check.
By implementing error handling and considering edge cases, you can create more robust and reliable workflows that can gracefully handle unexpected situations.
This philosophy applies to everything. Make sure to think about edge cases, and handle them appropriately. The workflow modifications should ideally be rare, so put the time into thinking about edge cases, and handling them upfront.
The stdout and stderr are often mechanisms for the tools to output the data. It is the primary way you manually inspect something, but there is a flaw in that.
You often want to go back, investigate the output, and parse it programmatically. You might want to transform it for the other tool to use it.
I found often that it is best to make sure the tool output is written to a file,
either by using shell redirection, or by using tee command.
But lets first start with why you should always output to a file.
Bad practice: Rely on stdout/stderr for data exchange between steps.
Best practice: Use files for data exchange between steps.
Stdout cannot be re-used easily. You would have to capture it, and write it to a file, which adds complexity to the workflow.
So you are left with two options:
Obviously, the 2. point is not ideal, since you are re-running the command multiple times, wasting resources, and potentially hitting rate limits. If the command is not deterministic, you might get different results each time.
So make sure you store the state in a file, since the file will eventually become an artifact.
Bad practice: Generate excessive output in stdout/stderr.
Best practice: Limit the amount of output generated to essential information only.
The log is designed to provide more information about the execution for debugging. It is expired after 2 weeks, and is not meant to be reusable.
Excessive output can lead to:
By limiting output to essential information, you can improve the readability and usefulness of logs, making it easier to diagnose issues and monitor workflow execution.
The important thing to call-out here is the mistake I made: running httpx
using -paths flag on a large scope.
I mistakenly thought that using the -silent flag would, well, silence the
output. However, -silent only removes non-essential information, but the
output is still there.
This lead to my scan failing due to extensive log output. Not only that, but UI started lagging when it parsed the logs for that job.
The UI can easily be fixed, but the log size restriction is specifically designed to limit misuse or malicious use of the logging system.
By being mindful of the output size and content, you can create more efficient and effective workflows.
I fixed this by simply redirecting the stdout to /dev/null, since I only
cared about the output file.
By being mindful of such limitations, you can create more efficient and effective workflows, that will serve you well in the long run.
By following these best practices, you can design and implement effective workflows that are maintainable, scalable, and resilient to changes and errors.
Maintainable, scalable and resilient workflows can easily be used as templates. So when you decide to hack on a new target, you can simply copy the workflow template, set the project variables/secrets, and start hunting right away!
I hope you found this article useful. If you have any questions, feel free to reach out to me on our discord server or use any of the contact methods listed the support page.
Happy hunting! 🎯
Currently Reading
Workflow Best Practices of 2025