How I contributed to Redis
Notes from the first open-source contribution - Redis
Overview
The Internet and many applications we use every day are powered by open-source software. Many popular software projects, such as Linux, WordPress, and Firefox, are open source, as well as React Native, Angular, and Kubernetes.
Contributing to an open-source project can seem simple, but getting a pull request accepted is quite a challenge. On this blog, I want to share my experiences and lessons learned while contributing to a major open-source - Redis for the first time.
History
Redis project was started in early 2009 by an Italian developer named Salvatore Sanfilippo (or antirez). Redis was initially written to improve the performance of LLOOGG, a real-time web analytics product out of Salvatore’s startup. By June of 2009, Redis was stable enough and had enough of a base feature set, to serve production traffic at LLOOGG. On June 19, 2009, an important milestone was hit: Salvatore deployed Redis to LLOOGG’s production environment and retired the MySQL installation. Over the next months, Redis rapidly grew in popularity. Salvatore fostered a great community, added features at a very rapid pace, and dealt with any and all reports of database corruption, instability, etc. with the utmost severity. In March of 2010, VMWare hired Salvatore to work full-time on Redis. (Redis itself remains BSD licensed.) Shortly thereafter, VMWare hired Pieter Noordhuis, a key Redis contributor, to give the project an additional momentum boost.
“The rest, as they say, is history”.
Redis is one of the most popular open-source engines today, and has been named the "Most Loved" database by Stack Overflow for five consecutive years. Because of its fast performance, Redis is a popular choice for caching, session management, gaming, leaderboards, real-time analytics, geospatial, ride-hailing, chat/messaging, media streaming, and pub/sub apps.
Although there’s no “Redis Inside” sticker affixed to the various products or services that use Redis, odds are you use Redis daily because Uber, Instacart, Slack, Hulu, Twitter, Instagram, and many more companies are using it.
“What Redis is good for is not my choice—it’s the application developer that knows” - antirez
Where did the pull request come from?
There are many ways to contribute to an open-source project, such as:
- Write and improve project documentation.
- Find and fix bugs.
- Propose and implement new features.
I submitted this pull request after finding a bug in Redis while evaluating chaos testing in our system. The details about the Redis bug will be presented in the next section. As for what Chaos Testing is, you can understand it is a technique that helps us explore edge cases in our services. For more details, our teams also have an excellent blog here.
More about the Redis bug
Allow writes only with N attached replicas feature
Redis supports high availability and failover through replication (the more powerful solutions are Redis Sentinel and Redis Cluster). In that case, one Redis instances acts as the master with multiple slaves attached to it. Data will be updated on this master and then the master replicates those changes to its slaves. With it, when a master is down, one slave will be advanced to a new master with full imitated data.
But there is still one case in which the master is not actually down but just temporarily disconnected from his slave. If the client continues to write data to this old master, data will be lost forever since when the partition heals, the master will be reconfigured as a replica of the new master, thus discarding its data set. You can see the below image.
To mitigate this inconsistency during a network partition, Redis provides a feature in replication: a Redis master to accept write queries only if at least N replicas are currently connected to the master. The state of "connected" is defined through two configuration parameters:
min-replicas-to-write <number of replicas>
min-replicas-max-lag <number of seconds>
It means if there are at least N replicas, with a lag of less than M seconds, then the write will be accepted. You may think of it as a best-effort data safety mechanism, where consistency is not ensured for a given write, but at least the time window for data loss is restricted to a given number of seconds. In general, bound data loss is better than unbound one.
Bug: Lack of checking for Lua script
From version 2.6.0, Redis allows clients to evaluate scripts using the Lua interceptor through EVAL command to gain better performance.
Currently, our service almost uses Lua scripts to interact with Redis and also enables the above replication feature to ensure consistency when split-brain occurs. When I made Chaos testing through simulating network partition in our Redis cluster, we expected the writes to the isolated master node would be rejected but nothing happened. So, I guess it is a bug and opened an issue for Redis, you can reproduce it as my description in the issue (9993).
Is this bug critical? In a certain aspect, this bug won't be a problem if your systems are not using Redis scripting. Otherwise, it causes a potential risk of data loss in split-brain scenarios, especially if data stored in Redis is sensitive and important. In the case of our service, this bug is serious because it can break our design to protect the service from failure.
Ship the pull request
Here’s the step-by-step action I did for this pull request:
- Read README
- Read code to identify the problem
- Fix bug and write test
- Submit the pull request to solicit feedback
Fix bug
From the above section, this bug is quite simple to fix by adding a minor check when evaluating Lua scripts for the above replication feature.
Once you know what exactly need to do, the rest is easy. The block of code (line 378 → 385) verify the master node has enough good slaves to accept write commands.
Write test
The next step is to write tests to verify everything works correctly. Redis has a test suite written in TCL that includes multiple testing scenarios: unit, module, or integration test.
TCL ("tickle") is a high-level , general-purpose , interpreted, dynamic programming language. It is commonly used embedded into C applications, for rapid prototyping, scripted applications, GUIs, and testing.
Func fact: This is why Redis uses TCL for testing by its author.
The current test cases have covered for "Allow writes only with N attached replicas" feature. It is in tests/integration-4.tcl
. So, I just simply modify and add more tests to evaluate Lua's script based on that.
Below is an example test case:
test {With min-slaves-to-write: master not writable with lagged slave} {
$master config set min-slaves-max-lag 2
$master config set min-slaves-to-write 1
assert_equal OK [$master set foo 123]
assert_equal OK [$master eval "return redis.call('set','foo',12345)" 0]
# Killing a slave to make it become a lagged slave.
exec kill -SIGSTOP [srv 0 pid]
# Waiting for slave kill.
wait_for_condition 100 100 {
[catch {$master set foo 123}] != 0
} else {
fail "Master didn't become readonly"
}
assert_error "*NOREPLICAS*" {$master set foo 123}
assert_error "*NOREPLICAS*" {$master eval "return redis.call('set','foo',12345)" 0}
exec kill -SIGCONT [srv 0 pid]
}
Run test:
# Required TCL 8.5 or newer to run tests
./runtest
Submit Pull Request
When making sure that everything works well, this is the time to send the pull request to solicit feedback (10160).
Each pull request needs to be reviewed and approved by at least 2 persons before it is merged. Take a little bit about reviewers.
- organa: whose name sounds like a character in Start War, he is working in Redis Labs and currently is one of the main maintainers of Redis open-source project. He is 4th in most contributors to the project.
- MeirShpilraien: He is a contributor that has a significant proposal and implementation for a big new feature in Redis 7.0 Redis FUNCTIONs (8693). He is involved because this issue also causes a bug in his new proposal feature.
- And my boss Le Duc Anh (anhldbk), motivates and gives useful guides to me when making this pull request.
Finally, after more reviewing (more than 20 comments), the pull request is merged into master. At the time of this blog, this fix is bundled in Bug fix in v6.2.7 and v7.0-rc2 (pre-release version) of Redis. And official Redis 7.0 be launched. Let's discover it!
What did I learn?
These are the main things I learned from my first open-source contribution:
- The first thing is how to contribute to an open-source project: from forks code to how to submit a pull request.
- Start small: while being ambitious is good, it will be easier to start from a slight change. It can be a bug fix or improvement to documentation or tests adding. Don’t feel like your contribution has to be hundreds of lines of code. Even one line fix will be a huge benefit to the community.
- Read contribution guidelines: everyone wants to jump into the codebase to start making changes. But still, it is a better idea to at least skim through documentation and contribution guidelines. Doing so may save you a lot of time because you would be able to get a general picture of the whole project, run the tests and be sure that they all pass before removing and adding anything new. Once you are happy with your code, go back to read them more thoroughly and be sure that you comply with contribution guidelines before submitting your first Pull Request.
- Finally, a way to make our system better is making its dependencies stronger. Contributing to open source is an amazing opportunity to give back to the community itself.
Acknowledgements
I would like to thank Le Duc Anh for all the encouragement and guidance to make this pull request to final and my teammate Nguyen Quoc Anh for his work on Chaos testing giving me an opportunity to find out this error.
Final Thoughts
In conclusion, contributing to an open-source project is not a big deal, going through step by step will eventually reach the destination. I only make a small bug fix, but I hope the above notes will be useful when you intend to contribute to an open-source project.
Go ahead and make your first contribution!