Solana Feature Gates

Last week I had to dive into the world of Solana Feature Gates for a project I was working on, so I wanted to summarize some of my learnings in case it’s useful or interesting to anyone else.

Solana’s feature gate mechanism allows the network of validators to update the behavior of the Solana runtime without causing a hardfork of the network. On Ethereum, changes to a validator client requires huge amounts of coordination. This means a higher fixed cost for an upgrade, which incentivizes more changes being added to a hardfork, which leads to more delays in shipping.

Comparing the steps for a Solana feature activation and an Ethereum hardfork is really cool to see; it shows how IBRL is not just “Solana culture”. It’s baked into the protocol itself.

Solana is continuous integration, Ethereum is large versioned releases.

Ethereum Hardforks

From a high-level, an Ethereum hardfork requires:

A set of approved EIPs to be agreed upon to be included in the hardfork
All Ethereum clients to implement these new EIPs*
An activation day to be chosen and that timestamp to be HARDCODED into the protocol activation
All validator runners to update their clients by the time the fork is due
When the chain reaches the activation timestamp, the behavior is changed and any clients that haven’t updated are creating invalid blocks (thus, a hard fork of the network)

This process requires coordination among all client implementations, a quorum of validator runners to upgrade in time, and leads to huge changes to the network that are irreversible without another hardfork.

If Ethereum were a developer, he’s the guy that doesn’t submit his PR until it’s “just right” and leaves you with 50k lines of code to review.

* I will note that Ethereum has more client implementations to coordinate, which is arguably better for decentralization - leading to less single points of failure in the software supply chain

Solana Feature Activations

Compare that to Solana Feature Activations: each SIMD gets its own feature activation that can go live semi-automatically as enough clients adopt the new version.

Generally, the steps for a feature activation are:

A SIMD is written and debated over, and possibly approved
A pubkey for that SIMD activation is generated, and added to the list of features in the Solana clients, starting off as inactive
The post-activation implementation of that feature is added to the clients
Validators update their client to support the new feature
Once a quorum of validators (95% of stake) have updated, the feature account is initialized by a core dev of the feature with a CLI command, marking the feature as “pending”
At the next epoch change, validators check for pending activations, find the new feature, and mark it as active internally.

This architecture means that features can be implemented and shipped in small batches, bringing a consistent flow of incremental changes to the network. Each network upgrade can be felt on its own, rather than being a part of a huge package of changes.

Another beautiful thing about the Solana features is that with the introduction of SIMD-0089, pending feature activations can be revoked.

Now that we’ve seen how these activations work at a high level, let’s look at some code.

The Feature Account

A feature activation account is a simple account that is owned by the feature account Feature111111111111111111111111111111111111 and represented by the struct:

pub struct Feature {
    pub activated_at: Option<u64>,
}

** Code here**

If there’s a feature being activated at myFEATURE11111111111111111111111, it can have three relevant states:

Inactive When the account does not exist or is not owned by the Feature111111111111111111111111111111111111 account, it’s considered inactive - at any epoch change the validators will ignore this account, and any of the runtime code paths for this feature are inactive.

Pending Once the feature account is owned by Feature111111111111111111111111111111111111, if the account data is empty or the first bit is 0 (i.e., activated_at: None), it’s in a “pending” state. Once an epoch boundary is crossed, all validators will find this pending feature and set its activated_at field to the current slot.

Active Once the feature account is owned by Feature111111111111111111111111111111111111 and the activated_at field is filled, the feature is active and the new behavior is live.

These three states beg a few questions. How do the validators know to check for the new feature myFEATURE11111111111111111111111 at epoch boundary? How is the new behavior actually implemented at the client level.

Feature Sets

When a feature author is implementing a feature, they first generate the feature’s pubkey (e.g. myFEATURE11111111111111111111111). Then they also add it to the agave client’s feature-set crate. They provide the pubkey for the new feature, add it to a list of all FEATURE_NAMES, and add it to a FeatureSnapshot of features that are not active on all clusters:

pub mod my_new_feature_does_great_things_i_promise {
    solana_pubkey::declare_id!("myFEATURE11111111111111111111111");
}
 
...
 
pub static FEATURE_NAMES: LazyLock<AHashMap<Pubkey, &'static str>> = LazyLock::new(|| {
    [
    ...
	    (
	      my_new_feature_does_great_things_i_promise::id(),
	      "SIMD_XXXX: GREAT THINGS"
	    )
    ]
    .iter()
    .cloned()
    .collect()
});
 
...
 
 
pub struct FeatureSnapshot {
 ...
 pub my_new_feature_does_great_things_i_promise_enabled: bool
}

Internally, the validator maintains a FeatureSet struct:

pub struct FeatureSet {
    active: AHashMap<Pubkey, u64>,
    inactive: AHashSet<Pubkey>,
    snapshot: FeatureSnapshot,
}

With this information, the validator client knows about all of the feature accounts. On startup, it can treat all features as inactive. Then it can fetch the account state for each account in FEATURE_NAMES to determine if it’s active, and activate it if so.

When a new epoch is being processed, these accounts can once again be iterated to find features that are pending activation:

fn compute_active_feature_set(&self, include_pending: bool) -> (FeatureSet, AHashSet<Pubkey>) {
	let mut active = self.feature_set.active().clone();
	let mut inactive = AHashSet::new();
	let mut pending = AHashSet::new();
	let slot = self.slot();
 
	for feature_id in self.feature_set.inactive() {
		let mut activated = None;
		if let Some(account) = self.get_account_with_fixed_root(feature_id) {
			if let Some(feature) = feature::state::from_account(&account) {
				match feature.activated_at {
					None if include_pending => {
						// Feature activation is pending
						pending.insert(*feature_id);
						activated = Some(slot);
					}
					Some(activation_slot) if slot >= activation_slot => {
						// Feature has been activated already
						activated = Some(activation_slot);
					}
					_ => {}
				}
			}
		}
		if let Some(slot) = activated {
			active.insert(*feature_id, slot);
		} else {
			inactive.insert(*feature_id);
		}
	}
 
	(FeatureSet::new(active, inactive), pending)
}

Code here

Each inactive feature account is fetched again to check if there are new features to be activated. This allows the validator to both treat the feature as active from here on out, but it also provides a list of “new feature activations”, so that any one-time code paths that need to be executed for the feature to function properly can be run.

This leads us to the actual feature implementations. For that, we’ll look at a couple of examples.

SIMD-0194

@deanmlittle recently got SIMD-0194 activated in mainnet. The SIMD is beautifully simple. This change showcases the epoch-boundary changes that a newly activated feature can implement:

 if new_feature_activations.contains(&feature_set::deprecate_rent_exemption_threshold::id())
{
	self.rent_collector.rent.lamports_per_byte_year =
		(self.rent_collector.rent.lamports_per_byte_year as f64
			* self.rent_collector.rent.exemption_threshold) as u64;
	self.rent_collector.rent.exemption_threshold = 1.0;
	self.update_rent();
}

If this feature is newly activated, set a few values. The internals of this if statement will be executed once and voila, feature implemented.

SIMD-0266 SIMD-0266 by @0x_febo introduces p-token, significantly improving the CU-efficiency of SPL token instructions. Knowing the massive scale and impact of this change makes seeing the actual feature activation change’s simplicity almost comical to see:

if new_feature_activations.contains(&feature_set::replace_spl_token_with_p_token::id()) {
	if let Err(e) = self.upgrade_loader_v2_program_with_loader_v3_program(
		&feature_set::replace_spl_token_with_p_token::SPL_TOKEN_PROGRAM_ID,
		&feature_set::replace_spl_token_with_p_token::PTOKEN_PROGRAM_BUFFER,
		self.feature_set
			.snapshot()
			.relax_programdata_account_check_migration,
		"replace_spl_token_with_p_token",
	) {
		warn!(
			"Failed to replace SPL Token with p-token buffer '{}': {e}",
			feature_set::replace_spl_token_with_p_token::PTOKEN_PROGRAM_BUFFER,
		);
	}
}

Code here

It’s as simple as “if we’re activating this feature, upgrade our SPL-token program to point a new program”. The actual new token program can be written, tested, and deployed (to the PTOKEN_PROGRAM_BUFFER account) completely independently. When the feature is ready to be activated, it’s a simple swap out and the whole network benefits from CU reduction. Incredible!

SIMD Checker

I started diving into the world of Solana feature activations because I was building a simple tool to be able to verify that each SIMD activation is actually doing what we expect it to do: the simd-checker

Inspired by @deanmlittle’s simd-0194-checker , with this project I want to compile a set of simple on-chain programs that showcase the runtime behavior of each SIMD.

While only a few SIMDs are implemented currently, each test will be composed of:

an entry in the manifest showing the SIMDs activation status on each network, the deployment status of the test program, and its relationship to other SIMDs, if any
an on-chain program that detects if the expected feature is in the expected activation status and allows verifying runtime values for a feature
an RPC-layer component that allows writing some “non-runtime” checks

When a feature is tested locally using surfpool, each feature is tested twice: one surfnet is started with the feature inactive (but all other previous features active), to verify the runtime behaves as expected when the feature is not yet active, and another is started with the feature active, verifying the enabled state.

Like I said, only a few SIMDs are currently implemented. So if you’re interested in learning more about a specific SIMD, open up a PR to add a test to this tool! Who knows, you may discover a runtime bug before a feature is activated!

🪴 Garden

Explorer