Getting Started
Haetae is incremental task runner.
The task can be test, lint, build, or anything.
It can be used in any project, no matter what language, framework, test runner, linter/formatter, build system, or CI you use.
For now, in this Getting Started article, we are starting from an example of incremental testing.
Why?
Let's say you're building a calculator project, named 'my-calculator'.
my-calculator
├── package.json
├── src
│ ├── add.js
│ ├── exponent.js
│ ├── multiply.js
│ └── subtract.js
└── test
├── add.test.js
├── exponent.test.js
├── multiply.test.js
└── subtract.test.js
The dependency graph is like this.
exponent.js
depends on multiply.js
, which depends on add.js
and so on.
When testing, we should take the dependency graph into account.
We do NOT have to test all files (*.test.js
) for every single tiny change (Waste of your CI resources and time).
Rather, we should do it incrementally, which means testing only files affected by the changes.
For example, when multiply.js
is changed, test only exponent.test.js
and multiply.test.js
.
When add.js
is changed, test all files (exponent.test.js
, multiply.test.js
, subtract.test.js
and add.test.js
).
When test file (e.g. add.test.js
) is changed, then just execute the test file itself (e.g. add.test.js
).
Then how can we do it, automatically?
Here's where Haetae comes in.
By just a simple config, Haetae can automatically detect the dependency graph and test only affected files.
(In this article, Jest (opens in a new tab) is used just as an example. You can use any test runner. )
Installation
So, let's install Haetae. (Node 16 or higher is required.)
It doesn't matter whether your project is new or existing (Haetae can be incrementally adapted).
It's good for monorepo as well. (Guided later in other part of docs.)
Literally any project is proper.
npm install --save-dev haetae
Are you developing a library (e.g. plugin) for Haetae?
You can depend on @haetae/core
, @haetae/utils
,
@haetae/git
, @haetae/javascript
,
@haetae/cli
independently. Note that the package haetae
includes all of them.
Basic configuration
Now, we are ready to configure Haetae.
Let's create a config file haetae.config.js
.
my-calculator
├── haetae.config.js # <--- Haetae config file
├── package.json
├── src # contents are omitted for brevity
└── test # contents are omitted for brevity
Typescript Support
If you want to write the config in typescript, name it haetae.config.ts
.
Then install ts-node
(opens in a new tab),
which is an optional peerDependencies
of haetae
(from @haetae/core
).
CJS/ESM
Haetae supports both CJS and ESM project.
Haetae is written in ESM, but it can be used in CJS projects as well, as long as the config file is ESM.
If your project is CJS, name the config file haetae.config.mjs
or haetae.config.mts
.
If your project is ESM, name the config file haetae.config.js
or haetae.config.ts
.
We can write it down like this.
Make sure you initialized git. Haetae can be used with any other version control systems, but using git is assumed in this article.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
// Other options are omitted for brevity.
commands: {
myTest: {
run: async () => {
// An array of changed files
const changedFiles = await git.changedFiles()
// An array of test files that (transitively) depend on changed files
const affectedTestFiles = await js.dependOn({
dependents: ['**/*.test.js'], // glob pattern
dependencies: changedFiles,
})
if (affectedTestFiles.length > 0) {
// Equals to "pnpm jest /path/to/foo.test.ts /path/to/bar.test.ts ..."
// Change 'pnpm jest' to your test runner.
await $`pnpm jest ${affectedTestFiles}`
}
},
},
},
})
Multiple APIs are used in the config file above.
They all have various options (Check out API docs).
But we are going to use their sensible defaults for now.
The Tagged Template Literal (opens in a new tab)
$
on line number 19 can run arbitrary shell commands.
It is execa (opens in a new tab)'s $
function (opens in a new tab),
and haetae preconfigured its cwd
as haetae config file's directory, and stdio
as 'inherit'
, all of which you can override of course.
If it receives a placeholder (e.g. ${affectedTestFiles}
) being an array, it automatically joins a whitespace (' '
) between elements.
It has other traits and options as well. Check out execa
's API docs for more detail.
import { $ } from 'haetae'
// The following two lines of code have same effects respectively
await $`pnpm jest ${affectedTestFiles}`
await $`pnpm jest ${affectedTestFiles.join(' ')}`
Credit to google/zx
$
as a Tagged Template Literal is first inspired by google/zx
(opens in a new tab). Thanks!
Then run haetae
like below.
$ haetae myTest
haetae
globally, you should execute it through package manager (e.g. pnpm haetae myTest
))
Note that myTest
in the command above is the name of the command we defined in the config file.
You can name it whatever you want. And as you might guess, you can define multiple commands
(e.g. myLint
, myBuild
, myIntegrationTest
, etc) in the config file.
It will print the result like this.
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 May 28 11:06:06 Asia/Seoul 1(timestamp: 1685239566483)
⎜ 🌱 env: {}
⎜ #️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: 979f3c6bcafe9f0b81611139823382d615f415fd
⎜ branch: main
⎣ specVersion: 1
As this is the first time of running the command haetae myTest
,
git.changedFiles()
in the config returns every file tracked by git in your project as changed files
(There are options. Check out API docs after reading this article).
This behavior results in running all of the tests.
js.dependOn()
understands direct or transitive dependencies between files,
by parsing import
or require()
, etc.
So it can be used to detect which test files (transitively) depend on at least one of the changed files.
js.dependOn
can detect multiple formats
ES6+, CJS, TypeScript, JSX, Webpack, CSS Preprocessors(Sass, Scss, Stylus, Less), PostCSS are supported.
For node, Subpath Imports (opens in a new tab) and Subpath Exports (opens in a new tab) are also supported.
For TypeScript, Path Mapping (opens in a new tab) is also supported.
If you use Typescript or Webpack, check out the API docs and pass additional options like options.tsConfig
and/or options.webpackConfig
.
js.dependOn
vs js.dependsOn
vs utils.dependOn
vs utils.dependsOn
There are severel APIs of simliar purposes.
js.dependOn
: For multiple dependents. On js ecosystem.js.dependsOn
: For a single dependent. On js ecosystem.utils.dependOn
: For multiple dependents. General-purpose.utils.dependsOn
: For a single dependent. General-purpose.
Check out the API docs later for more detail.
Note that it cannot parse dynamic imports (import()
).
Dynamic or extra dependencies can be specified as additionalGraph
option, explained later in this article.
my-calculator
├── .haetae/store.json # <--- Generated. Haetae store file
├── haetae.config.js
├── package.json
├── src
└── test
May you have noticed, the store file .haetae/store.json
is generated.
It stores history of Haetae executions, which makes incremental tasks possible.
For example, the commit ID 979f3c6
printed from our first execution example above is the current git HEAD haetae myTest
ran on.
This information is logged in the store file to be used later.
Detecting the last commit Haetae ran on successfully
Let's say we made some changes and added 2 commits.
979f3c6
is the last commit Haetae ran on successfully.
0c3b3cc
and 1d17a2f
are new commits after that.
What will happen when we run Haetae again?
$ haetae myTest
This time, only exponent.test.js
and multiply.test.js
are executed.
That's because git.changedFiles()
automatically
returns only the files changed since the last successful execution of Haetae.
For another example, if you modify add.js
, then all tests will be executed,
because js.dependOn()
detects dependency transitively.
If you modify add.test.js
, only the test file itself add.test.js
will be executed,
as every file is treated as depending on itself.
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 May 28 19:03:25 Asia/Seoul (timestamp: 1685268205443)
⎜ 🌱 env: {}
⎜ #️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: 1d17a2f2d75e2ac94f31e53376c549751dca85fb
⎜ branch: main
⎣ specVersion: 1
Accordingly, the new commit 1d17a2f
is logged in the store file.
The output above is an example of successful task.
Conversely, if the test fails, pnpm jest <...>
, which we gave to $
in the config, exits with non-zero exit code.
This lets $
throws an error.
So myTest.run()
is not completed successfully, causing the store file is not renewed.
This behavior is useful for incremental tasks. The failed test (or any incremental task) will be re-executed later again until the problem is fixed.
stdio
stdio
(opens in a new tab) of $
is 'inherit'
by default.
This makes Promise rejected when it's failed (opens in a new tab) (e.g. non-zero exit code).
If you set stdio
differently, you should manually make sure whether to throw an error.
For example, when stdio
is set to 'pipe'
, assert failed
(opens in a new tab) and/or stderr
(opens in a new tab) properties by yourself.
env
configuration
Sometimes we need to separate several environments.
Simple environment variable example
For example, logic of your project might act differently depending on the environment variable $NODE_ENV
.
So, the history of an incremental task also should be recorded for each environment in a separated manner.
Let's add env
to the config file to achieve this.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
commands: {
myTest: {
env: { // <--- Add this
NODE_ENV: process.env.NODE_ENV,
},
run: async () => { /* ... */ },
},
},
})
The key name NODE_ENV
is just an example. You can name it as you want.
From now on, the store file will manage the history of each environment separately.
For example, if $NODE_ENV
can have two values, 'development'
or 'production'
,
then Haetae will manage two incremental histories for each environment.
You don't have to care about the past history of myTest
executed without env
.
When a command is configured without env
, it's treated as if configured with env: {}
, which is totally fine.
So there will be 3 env
s to be recorded in the store file:
{}
{ NODE_ENV: 'production' }
{ NODE_ENV: 'development' }
Though we changed the schema of env
in the config from {}
to { NODE_ENV: 'development' | 'production' }
,
the history of env: {}
already recorded in the store file is NOT automatically deleted.
It just stays in the store file.
This behavior is safe because incremental histories are managed per env. So don't worry about the past's vestige.
If you care about disk space, configuring the auto-removal of some obsolete history is guided later in this article.
Multiple keys
You can add more keys in env
object.
For instance, let's change the config to this.
import assert from 'node:assert/strict' // `node:` protocol is optional
import { $, configure, git, utils, js, pkg } from 'haetae'
import semver from 'semver'
export default configure({
commands: {
myTest: {
env: async () => { // <--- Changed to async function from object
assert(['development', 'production'].includes(process.env.NODE_ENV))
return {
NODE_ENV: process.env.NODE_ENV,
jestConfig: await utils.hash(['jest.config.js']),
jest: (await js.version('jest')).major,
branch: await git.branch(),
os: process.platform,
node: semver.major(process.version),
haetae: pkg.version.major,
}
},
run: async () => { /* ... */ },
},
},
})
The object has more keys than before, named jestConfig
, jest
, branch
and so on.
In this example, if any of $NODE_ENV
, Jest config file, major version of Jest, git branch, OS platform, major version of Node.js,
or major version of the package haetae
is changed, it's treated as a different environment.
And now env
becomes a function. You can even freely write any additional code in it,
like assertion (assert()
) in line number 9 above. myTest.env()
is executed before myTest.run()
.
When an error is thrown in myTest.env()
, myTest.run()
is not executed, and the store file is not renewed.
This is intended design for incremental tasks.
If you just want to check the value the env
function returns, you can use -e, --env
option.
This does not write to the store file, but just prints the value.
$ haetae myTest --env
✔ success Current environment is successfully evaluated for the command myTest
⎡ env:
⎜ NODE_ENV: development
⎜ jestConfig: 642645d6bc72ab14a26eeae881a0fc58e0fb4a25af31e55aa9b0d134160436eb
⎜ jest: 29
⎜ branch: main
⎜ os: darwin
⎜ node: 18
⎜ haetae: 0
⎣ envHash: 203ceac1714279231e82d91614f2ebe50f5b1a7a
Additional dependency graph
Until now, js.dependOn()
is used for automatic detection of dependency graph.
But sometimes, you need to specify some dependencies manually.
Simple integration test
For example, let's say you're developing a project communicating with a database.
your-project
├── haetae.config.js
├── package.json
├── src
│ ├── external.js
│ ├── logic.js
│ └── index.js
└── test
├── data.sql
├── external.test.js
├── logic.test.js
└── index.test.js
The explicit dependency graph is like this.
logic.js
contains business logic, including communicating with a database.
external.js
communicates with a certain external service, regardless of the database.
But there is an SQL file named data.sql
for an integration test.
It's not (can't be) imported (e.g. import
, require()
) by any source code file obviously.
Let Haetae think logic.js
depends on data.sql
, by additionalGraph
.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
commands: {
myTest: {
env: { /* ... */ },
run: async () => {
const changedFiles = await git.changedFiles()
// A graph of additional dependencies specified manually
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['src/logic.js'],
dependencies: ['test/data.sql'],
},
],
})
const affectedTestFiles = await js.dependOn({
dependents: ['**/*.test.js'],
dependencies: changedFiles,
additionalGraph, // <--- New option
})
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
},
},
},
})
Then the implicit dependency graph becomes explicit.
From now on, when the file data.sql
is changed, index.test.js
and logic.test.js
. are executed.
As external.test.js
doesn't transitively depend on data.sql
, it's not executed.
Unlike this general and natural flow, if you decide that index.test.js
should never be affected by data.sql
,
you can change the config.
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['test/logic.test.js'], // 'src/logic.js' to 'test/logic.test.js'
dependencies: ['test/data.sql'],
},
],
})
By this, data.sql
doesn't affect index.test.js
anymore.
But I recommend this practice only when you're firmly sure that index.test.js
will not be related to data.sql
.
Because, otherwise, you should update the config again when the relation is changed.
env
vs additionalGraph
The effect of addtionalGraph
is different from env
.
env
is like defining parallel universes, where history is recorded separately.
If you place data.sql
in env
(e.g. with utils.hash()
) instead of additonalGraph
,
every test file will be executed when data.sql
changes,
unless the change is a rollback to past content which can be matched with a past value of env
logged in the store file (.haetae/store.json
).
external.js
and external.test.js
are regardless of database.
That's why data.sql
is applied as addtionalGraph
, not as env
.
But that's case by case. In many situations, env
is beneficial.
- If
data.sql
affects 'most' of your integration test files,
or
- If which test file does and doesn't depend on
data.sql
is not clear or the relations change frequently,
or
- If
data.sql
is not frequently changed,
then env
is a good place.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
commands: {
myTest: {
env: async () => ({
testData: await utils.hash(['test/data.sql']),
}),
run: async () => { /* ... */ }, // without `additionalGraph`
},
},
})
Cartesian product
You can specify the dependency graph from a chunk of files to another chunk.
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['test/db/*.test.js'],
dependencies: [
'test/docker-compose.yml',
'test/db/*.sql',
],
},
],
})
This means that any test file under test/db/
depends on any SQL file under test/db/
and test/docker-compose.yml
.
Distributed notation
You don't have to specify a dependent's dependencies all at once. It can be done in a distributed manner.
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['foo', 'bar'],
dependencies: ['one', 'two'],
},
{
dependents: ['foo', 'qux'], // 'foo' appears again, and it's fine
dependencies: ['two', 'three', 'bar'], // 'two' and 'bar' appear again, and it's fine
},
{
dependents: ['one', 'two', 'three'],
dependencies: ['two'], // 'two' depends on itself, and it's fine
},
{
dependents: ['foo'],
dependencies: ['one'], // 'foo' -> 'one' appears again, and it's fine
},
],
})
On line number 13-14, we marked two
depending on two
itself.
That's OK, as every file is treated as depending on itself.
So foo
depends on foo
. bar
also depends on bar
, and so on.
Circular dependency
Haetae supports circular dependency as well. Although circular dependency is, in general, considered not a good practice, it's fully up to you to decide whether to define it. Haetae does not prevent you from defining it.
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['index.js'],
dependencies: ['foo'],
},
{
dependents: ['foo'],
dependencies: ['bar'],
},
{
dependents: ['bar'],
dependencies: ['index.js'],
},
],
})
Assume the relations between index.js
, foo
, and bar
are given by additionalGraph
,
and the rests are automatically detected.
In this situation, index.test.js
is executed when any of files, except utils.test.js
, are changed, including foo
, and bar
.
On the other hand, utils.test.js
is executed only when utils.js
or utils.test.js
itself is changed.
More APIs not covered
There're more APIs related to dependency graph, like
js.graph
,
js.deps
,
utils.deps
,
utils.mergeGraph
, etc.
This article doesn't cover them all. Check out the API docs for more detail.
Record Data
Haetae has a concept of 'Record' (type: core.HaetaeRecord
)
and 'Record Data' (type: core.HaetaeRecord.data
).
In the previous sections, we've already seen terminal outputs like this.
$ haetae myTest
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 May 28 11:06:06 Asia/Seoul (timestamp: 1685239566483)
⎜ 🌱 env: {}
⎜ #️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: 979f3c6bcafe9f0b81611139823382d615f415fd
⎜ branch: main
⎣ specVersion: 1
This information is logged in the store file (.haetae/store.json
), and called 'Record'.
The data
field is called 'Record Data'.
Let's check them out.
$ cat .haetae/store.json
The output is like this.
{
"specVersion": 1,
"commands": {
"myTest": [
{
"data": {
"@haetae/git": {
"commit": "1d17a2f2d75e2ac94f31e53376c549751dca85fb",
"branch": "main",
"specVersion": 1
}
},
"env": {},
"envHash": "bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f",
"time": 1685239566483
},
{
"data": {
"@haetae/git": {
"commit": "a4f4e7e83eedbf2269fbf29d91f08289bdeece91",
"branch": "main",
"specVersion": 1
}
},
"env": {
"NODE_ENV": "production"
},
"envHash": "4ed28f8415aeb22c021e588c70d821cb604c7ae0",
"time": 1685458529856
},
{
"data": {
"@haetae/git": {
"commit": "442fefc582889bdaee5ec2bd8b74804680fc30ee",
"branch": "main",
"specVersion": 1
}
},
"env": {
"NODE_ENV": "development"
},
"envHash": "2b580e42012efb489cdea43194c9dd6aed6b77d8",
"time": 1685452061199
},
{
"data": {
"@haetae/git": {
"commit": "ef3fdf88e9fad90396080335096a88633fbe893f",
"branch": "main",
"specVersion": 1
}
},
"env": {
"jestConfig": "642645d6bc72ab14a26eeae881a0fc58e0fb4a25af31e55aa9b0d134160436eb",
"jest": 29,
"branch": "main",
"os": "darwin",
"node": 18,
"haetae": 0
},
"envHash": "62517924fb2c6adb38b4f30ba75a513066f5ac80",
"time": 1685455507556
},
{
"data": {
"@haetae/git": {
"commit": "7e3b332f0657272cb277c312ff25d4e1145f895c",
"branch": "main",
"specVersion": 1
}
},
"env": {
"testData": "b87b8be8df58976ee7da391635a7f45d8dc808357ff63fdcda699df937910227"
},
"envHash": "7ea1923c8bad940a97e1347ab85abd4811e82531",
"time": 1685451151035
}
]
}
}
Env Hash
The field envHash
is SHA-1 of env
object.
The env
object is serialized by a deterministic method no matter how deep it is, and calculated as a hash.
The hash is used to match the current env
with previous records.
SHA-1 is considered insecure to hide information, but good enough to prevent collision for history comparison.
For example, git
also uses SHA-1 as a commit ID.
When your Env or Record Data contains a confidential field and you're worrying what if the store is leaked,
you can preprocess secret fields with a stronger cryptographic hash algorithm,
like SHA-256 or SHA-512.
The practical guide with utils.hash()
is explained just in the next section.
recordRemoval.leaveOnlyLastestPerEnv
of localFileStore
By default, you're using localFileStore
as a 'Store Connector'.
localFileStore
stores records into a file (.haetae/store.json
).
The option recordRemoval.leaveOnlyLastestPerEnv
is true
by default.
So only the last records per env
exist in the store file.
This is useful when you only depend on the latest Records.
To utilize further past Records, you can set the option false
.
Changing or configuring 'Store Connector' is guided later.
5 Records are found in total.
These are what we've done in this article so far.
Each of these is the last history of Records executed in each env
respectively.
For example, the command myTest
was executed with env: {}
on several commits,
and 1d17a2f
is the last commit.
Custom Record Data
Configuration files for your application is a good example showing the usefulness of Record Data.
I mean a config file not for Haetae, but for your project itself.
To say, dotenv (.env
), .yaml, .properties, .json, etc.
Usually, an application config file satisfies these 2 conditions.
- It's not explicitly imported (e.g.
import
,require()
) in the source code. Rather, the source code 'reads' it on runtime. --->additionalGraph
orenv
are useful. - It's ignored by git. ---> 'Record Data' is useful.
Let's see how it works, with a simple example project using .env
as the application config.
dotenv
.env
is a configuration file for environment variables, and NOT related to Haetae's env
at all.
your-project
├── .env # <--- dotenv file
├── .gitignore # <--- ignores '.env' file
├── haetae.config.js
├── package.json
├── src
│ ├── config.js
│ ├── utils.js
│ ├── logic.js
│ └── index.js
└── test
├── utils.test.js
├── logic.test.js
└── index.test.js
src/config.js
reads the file .env
, by a library dotenv (opens in a new tab) for example.
import { config } from 'dotenv'
config()
export default {
port: process.env.PORT,
secretKey: process.env.SECRET_KEY,
}
Let's assume logic.js
gets the value of environment variables through config.js
, not directly reading from .env
or process.env
.
The explicit source code dependency graph is like this.
Let Haetae think config.js
depends on .env
.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
commands: {
myTest: {
env: { /* ... */ },
run: async () => {
const changedFiles = await git.changedFiles()
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['src/config.js'],
dependencies: ['.env'],
},
],
})
const affectedTestFiles = await js.dependOn({
dependents: ['**/*.test.js'],
dependencies: changedFiles,
additionalGraph,
})
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
},
},
},
})
Then the implicit dependency graph becomes explicit.
But that's not enough, because .env
is ignored by git.
git.changedFiles()
cannot detect if .env
changed or not.
Let's use 'Record Data' to solve this problem. Add these into the config file like this.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
commands: {
myTest: {
env: { /* ... */ },
run: async ({ store }) => {
const changedFiles = await git.changedFiles()
const previousRecord = await store.getRecord()
const dotenvHash = await utils.hash(['.env'])
if (previousRecord?.data?.dotenv !== dotenvHash) {
changedFiles.push('.env')
}
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['src/config.js'],
dependencies: ['.env'],
},
],
})
const affectedTestFiles = await js.dependOn({
dependents: ['**/*.test.js'],
dependencies: changedFiles,
additionalGraph,
})
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
return {
dotenv: dotenvHash
}
},
},
},
})
Now, we return an object from myTest.run
.
Let's execute it.
$ haetae myTest
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 Jun 08 09:23:07 Asia/Seoul (timestamp: 1686183787453)
⎜ 🌱 env: {}
⎜ #️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: ac127da6531efa487b8ee35451f24a70dc58aeea
⎜ branch: main
⎜ specVersion: 1
⎣ dotenv: 7f39224e335994886c26ba8c241fcbe1d474aadaa2bd0a8e842983b098cea894
Do you see the last line?
The value we returned from myTest.run
is recorded in the store file, as part of Record Data.
Hash confidential
utils.hash()
is good for secrets like a dotenv file.
By default, it hashes by SHA-256, and you can simply change the cryptographic hash algorithm by its options, like to SHA-512 for example.
Thus, you do not need to worry about if the store file is leaked.
This time, .env
was treated as a changed file, as the key dotenv
did not exist from previousRecord
.
// Other content is omitted for brevity
if (previousRecord?.data?.dotenv !== dotenvHash) {
changedFiles.push('.env')
}
Therefore, index.test.js
and logic.test.js
, which transitively depend on .env
, are executed.
If you run Haetae again immediately,
$ haetae myTest
This time, no test is executed, as nothing is considered changed. .env
is treated as not changed, thanks to the Record Data.
From now on, though the file .env
is ignored by git, changes to it are recorded by custom Record Data.
So it can be used in incremental tasks.
Reserved Record Data
We can enhance the workflow further.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
commands: {
myTest: {
env: { /* ... */ },
run: async () => {
const changedFiles = await git.changedFiles()
const changedFilesByHash = await utils.changedFiles(['.env'])
changedFiles.push(...changedFilesByHash)
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['src/config.js'],
dependencies: ['.env'],
},
],
})
const affectedTestFiles = await js.dependOn({
dependents: ['**/*.test.js'],
dependencies: changedFiles,
additionalGraph,
})
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
// No return value
},
},
},
})
We return nothing here.
We do not calculate hash by ourselves.
But this has the same effect as what we've done in the previous section.
$ haetae myTest
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 Jun 11 00:27:40 Asia/Seoul (timestamp: 1686410860187)
⎜ 🌱 env: {}
⎜ #️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: 018dd7e0c65c3a9d405485df7949ef75ff96e757
⎜ branch: main
⎜ specVersion: 1
⎜ "@haetae/utils":
⎜ files:
⎜ .env: 7f39224e335994886c26ba8c241fcbe1d474aadaa2bd0a8e842983b098cea894
⎣ specVersion: 1
You can see the hash of .env
is recorded.
utils.changedFiles
automatically writes hash in Record Data,
and compares the current hash with the previous one.
How is this possible?
There's a concept of Reseved Record Data.
If you call core.reserveRecordData
,
you can 'reserve' Record Data without directly returning custom Record Data from the command's run
function.
git.changedFiles
and utils.changedFiles
call core.reserveRecordData
internally.
This mechanism can be especially useful for sharable generic features, like a 3rd-party library for Haetae.
For that, it's important to avoid naming collision.
Record Data can have arbitrary fields.
So Haetae uses a package name as a namespace by convention.
'@haetae/git'
and '@haetae/utils'
keys in Record Data are namespaces to avoid such a collision.
Multiple Reserved Record Data
All Reserved Record Data are saved in the list reservedRecordDataList
.
The list is to be merged by deepmerge
(opens in a new tab).
utils.changedFiles
is more useful for multiple files.
Let's say you have multiple dotenv files per environment, unlike the previous assumption.
For example, .env.local
, .env.development
, and .env.staging
are targets to test.
Now, config.js
reads .env.${process.env.ENV}
,
where $ENV
is an indicater of environment: 'local'
, 'development'
or 'staging'
.
Then we can modify the config file like this.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
commands: {
myTest: {
env: { /* ... */ },
run: async () => {
const changedFiles = await git.changedFiles()
const changedFilesByHash = await utils.changedFiles(
['.env.*'], // or explicit glob pattern ['.env.{local,development,staging}']
{
renew: [`.env.${process.env.ENV}`],
},
)
changedFiles.push(...changedFilesByHash)
const additionalGraph = await utils.graph({
edges: [
{
dependents: ['src/config.js'],
dependencies: [`.env.${process.env.ENV}`],
},
],
})
const affectedTestFiles = await js.dependOn({
dependents: ['**/*.test.js'],
dependencies: changedFiles,
additionalGraph,
})
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
},
},
},
})
renew
is a list of files (or glob pattern)
that will be renewed (if changed) by their current hash.
By default, renew
is equal to all files(['env.*']
) we gave as the argument.
In our config, by limiting it to .env.${process.env.ENV}
, you only renew the single dotenv file.
Let's say currently $ENV
is 'local'
.
Obviously, .env.local
, .env.development
, and .env.staging
are compared to the previous hashes.
If changes are detected, included in the result array.
But regardless of it, .env.development
, and .env.staging
are not renewed in the new Record Data.
Their previous hashes will be written in the new Record instead of current hashes.
This behavior can be good for our test in many scenarios.
For instance, you may modify .env.development
when $ENV
is 'local'
.
As it's not in renew
list, the hash of .env.development
is not updated.
When later $ENV
becomes 'development'
, utils.changedFiles
would still think .env.development
is a changed file,
as the current hash and previously recorded hash are different.
This makes sure test files are to be executed when $ENV
becomes 'development'
.
renew
exists for the discrepancy between
when the physical change actually happens and when the detection of the change is needed.
utils.changedFiles
has many more options,
and acts in a sophisticated way.
For example, by an option keepRemovedFiles
, which is not introduced above,
you can handle cases like when not all of the files might exist on the filesystem at the same time
and only a few of them are dynamically used in incremental tasks.
For instance, a CI workflow might have access to only .env.development
at a certain time,
while it might have access to only .env.staging
at another time.
And you may still want the incremental history not separated but shared between the two cases.
That's where keepRemovedFiles
comes in.
utils.changedFiles
is not covered thoroughly here.
Check out the API docs for more detail.
There's one more thing to take care of utils.changedFiles
.
You should NOT give a dynamic files argument to it.
Otherwise, a file would be treated as changed every time the dynamic argument changes.
// Other content is omitted for brevity
const changedFilesByHash = await utils.changedFiles(
[`.env.${process.env.ENV}`] // <--- Anti-pattern
)
The snippet above lets only a single file to be recorded.
So, if $ENV
is changed, the previous file is no longer recorded.
This has no safety problem, but reduces incrementality.
Therefore you should list all of the candidates, like ['.env.*']
.
Root Env and Root Record Data
Haetae has a concept of 'Root Env' (type: core.RootEnv
)
and 'Root Record Data' (type: core.RootRecordData
).
They are decorater (opens in a new tab)-like transformers
for the return value of env
and run
of every command.
import { $, configure, git, utils, js } from 'haetae'
export default configure({
recordData: async (data) => ({ // <--- 'Root Record Data'
hello: data.hello.toUpperCase(),
}),
commands: {
myGreeting: {
run: () => ({ hello: 'world' }),
},
},
})
$ haetae myGreeting
✔ success Command myGreeting is successfully executed.
⎡ 🕗 time: 2023 Jun 14 15:49:52 Asia/Seoul (timestamp: 1686725392672)
⎜ 🌱 env: {}
⎜ #️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
⎣ hello: WORLD # <--- capitalized
Let's get into a more practical example.
You may want the config file's hash to be automatically recorded into every command's env
.
import * as url from 'node:url'
import { $, configure, git, utils, js } from 'haetae'
export default configure({
env: async (env) => ({ // <--- 'Root Env'
...env,
// Equals to => await utils.hash(['haetae.config.js']),
haetaeConfig: await utils.hash([url.fileURLToPath(import.meta.url)]),
}),
commands: {
myGreeting: {
env: {
NODE_ENV: process.env.NODE_ENV
},
run: () => { /* ... */ }
},
},
})
By Root Env, it's done in a single place.
$ haetae myGreeting --env
✔ success Current environment is successfully evaluated for the command myGreeting
⎡ env:
⎜ NODE_ENV: development
⎜ haetaeConfig: f7c12d5131846a5db496b87cda59d3e07766ed1bde8ed159538e85f42f3a8dae
⎣ envHash: e9422335258f9338b7205d11aafdb329bb008f7a
By the way, you can go even thoroughly.
js.deps
lists every direct and transitive dependency.
// Other content is omitted for brevity
haetaeConfig: await utils.hash(
await js.deps({ entrypoint: url.fileURLToPath(import.meta.url) }),
),
This snippet calculates a hash of the config file and its dependencies.
For example, if you import a.js
into haetae.config.js
, and a.js
depends on b.js
,
then the hash is calculated against the three files: haetae.config.js
, a.js
, and b.js
.
When hashing multiple files, a single-depth Sorted Merkle Tree is used.
Check out the API docs for more detail.
If you don't import other modules in the config, this is not necessary.