要寫易刪除,而不易擴展的代碼


譯者序


本文托管在 GitHub 上: https://github.com/freedombird9/code-easy-to-delete,歡迎 Star 或糾錯。


好的文章總是見解獨到,功底深厚而邏輯清晰。這是一篇關於如何設計、架構代碼的文章。文章的觀點新穎而有力。作者的觀點是,我們所做的一切 —— 重構、模塊化、分層,等等,都是為了讓我們的代碼易於被刪改,都是為了讓遺留代碼不成為我們的負擔,而不是為了代碼復用。

作者認為,經過七個不同的開發階段,最終便可以提煉出這樣的代碼。每個階段都有詳細的介紹和例子。

初讀文章,可能會有抽象、晦澀之感。但多讀幾遍之后,其主旨就會變的清晰。

一個晚上的徹夜不眠,有了這篇中文翻譯,與大家分享,希望對讀者有所助益。

本文托管在 GitHub 上,水平有限,還望大家多多指點。

感謝

謝謝秋兄將這篇文章分享給我。

中文翻譯如下

編程是一件很糟糕的事 —— 在荒廢了自己的一生之后所學到的東西

要寫容易刪除,而不容易擴展的代碼。


沒有一行代碼產生於理性、有很強的可維護性,且不會被偶然地刪除掉 Jean-Paul Sartre’s Programming in ANSI C.


每寫一行代碼,都會有一個代價:維護。為了不在代碼上花費太多,我們有了可復用的軟件。但是代碼復用有一個問題:當你以后想要修改的時候它就會成為一個障礙。

一個 API 的用戶越多,為了引入修改而需要重寫的代碼就越多。相似的,你依賴第三方 API 越多,當其有任何改變時你的麻煩就越多。管理代碼之間的兼容性,或者模塊之間的依賴關系在大型系統中是一個很重要的問題。而且隨着項目越來越久,這個問題就會變得越復雜。


今天我的觀點是,如果我們要去計算一個程序有多少行代碼,我們不應該將其看成是「產生了多少行」,而應該看成「耗費了多少行。」 EWD 1036


如果我們將「有多少行代碼」看成是「耗費了多少行代碼」的話,那么當我們刪除這些代碼的時候,我們就降低了維護成本。我們應該努力開發可丟棄的(disposable)軟件,而不是可復用的軟件。

我不需要告訴你刪除代碼比寫代碼更有趣吧。

為了寫易於刪除的代碼:重復你自己以避免產生模塊依賴性,但是不要重復管理這些代碼。同時將你的代碼分層:在易於實現但不易於使用的模塊的基礎上構建易於使用的 API。拆分你的代碼:將很難於實現且很可能會改變的模塊互相隔離,並同時和其他的模塊隔離。不要將每一個選項都寫死,容許在運行時做改變。不要試圖同時去做上述所有的事情,或許你在一開始就不要寫這么多代碼。

階段0:不寫代碼

代碼有多少行本身並不能告訴我們什么,但是代碼行數的數量級可以:50,500,5000,10000,25000等等。一個一百萬行的龐然大物顯然會比一個一萬行的程序更折磨人。替代它也會顯著花費更多的時間、金錢和努力。

雖然代碼越多,摒棄起來就越困難,但是少寫一行代碼本身並不能省掉任何事情。

即使如此,最容易刪除的代碼是你一開始就避免寫出來的代碼。

階段1:復制粘貼代碼

寫可復用的代碼是一件在事后有了代碼庫中的使用示例后更容易做的事情,而不是在事前就能預料好的。往好的看,僅僅是利用文件系統你或許就已經在復用很多代碼了,所以何必這么擔心呢?一點點冗余是健康的。

復制粘貼代碼若干次,而不是僅僅為了給這個用法取一個名字就去寫一個庫函數,是完全沒有問題的。一旦把一個東西變成共享的 API,改變起來就會更困難。

調用你的函數的那段代碼會依賴於其實現背后有意或無意的行為。使用你的函數的程序員不會根據你的文檔去調用,而會根據他們觀察到的函數行為去調用。

刪除函數內的代碼比刪除一個函數更簡單。

階段2:不要復制粘貼代碼

當你已經復制粘貼足夠多次數時,或許就是該提煉出一個函數的時候了。這是「把我從標准庫中拯救出來」的東西:「打開一個配置文件並返回一個哈希表」,「刪除這個文件夾」。這些例子包括了無狀態函數,或者有一些全局信息,如環境變量的函數。這些是最終會出現在一個叫做 “util” 文件中的東西。

旁白:建一個 util 文件夾,把不同的功用放在不同的文件里。單個 util 文件總是會不斷變大直到大得來無法拆分。使用單個 util 文件是不簡潔的做法。

對於應用或者項目而言通用性越強的代碼,就越容易復用,被改變或者刪除的可能性就越低。它們包括日志記錄,第三方 API,文件柄(handle)或者進程相關的庫。其他你不會刪除掉的代碼有列表、哈希表,以及其他集合。這不是因為它們的接口通常都很簡單,而是因為它們的作用域不會隨着時間的增長而變大。

我們要努力將代碼中難以刪除的部分與易於刪除的部分分隔得盡可能開,而不是使所有代碼都變得易於刪除。

階段3:寫更多的模版

雖然我們通過庫來避免復制粘貼,但是我們常常會需要復制粘貼來使用這些庫,最后導致寫了更多的代碼。不過我們給這些代碼另外一個名字:模版(boilerplate)。模版和復制粘貼在很大程度上很像,除了每次使用模版的時候都會在不同的地方做一些改變,而不是一次次重復完全一樣的東西。

就像復制粘貼一樣,我們會重復部分代碼以避免引入依賴性,以獲得靈活度,代價則是冗余。

需要模版的庫通常有網絡協議、有線格式(wire formats)、解析套件,或者很難將策略(一個程序應該做的)和協議(一個程序能做的)交織起來而又不限制可選項的東西。這種代碼是很難被刪除的:與其他的電腦通信或者處理不同的文件通常是一種必需,而我們永遠不想讓業務邏輯充斥其中。

寫模版不是在練習代碼復用:我們盡可能將變化頻繁的部分和相對更穩定的部分分隔開。應最小化庫的依賴性或責任,即使我們必須通過模版來使用它們。

你會寫更多的代碼,但是這些多出來的代碼都是在易於刪除的部分。

階段4:不要寫模版

當庫需要迎合所有要求的時候,模版的作用最為明顯。但是有時候重復的東西太多了。是時候將一個彈性很大的庫用一個考慮到了策略、流程和狀態的庫打包起來了。開發易用的 API 就是將模版轉換成一個庫。

這比你想象中的要普遍:最為流行和倍受喜愛的 Python http 客戶端模塊 requests 就是一個很成功的例子,它將一個使用起來更為繁瑣的庫 urllib3 打包,為用戶提供了一套更加簡單的接口。當使用 http 的時候, requests 照顧到普遍的工作流,而對用戶隱藏了許多實際的細節。相比而言, urllib3 處理流水線和連接管理,不對用戶隱藏任何細節。

當把一個庫包進另一個庫的時候,與其說是為了隱藏細節,倒不如說是為了將不同的關切分開: requests 是關於http的冒險,urllib3 則是給你工具讓你自己選擇你自己的冒險。

我並不是主張讓你去建一個 /protocol/ 和 /policy/ 文件夾,但是你確實應該嘗試使 util 不受業務邏輯的干擾,並且在易於實現的庫的基礎上開發易於使用的庫。你並不需要將一個庫全部寫完之后再在上面寫另一個庫。

將一個第三方庫打包起來通常也是很好的實踐,即使它們不是協議類的庫。你可以寫一個適合你的代碼的庫,而不是在整個項目中都鎖定一個選擇。開發一個好用的 API 和開發一個具有擴展性的 API 通常是互相沖突的。

像這樣將不同的關切分開,能讓我們在使一些用戶很高興的同時不會讓其他用戶想做的事情變得不可能。當你從一開始就有一個好的 API 的時候,分層是最簡單的。但是在一個寫得不好的 API 上開發出一個好的 API 則會很困難。好的 API 在設計之時就會站在使用者的位置上考慮問題,而分層則是我們意識到我們不能同時讓所有人都高興。

分層更多的是為了使那些很難刪除的代碼易於使用(在不讓業務邏輯污染它們的情況下),而不僅僅是關於寫以后可以刪除的代碼。

階段5:寫一大段代碼

你已經復制粘貼了,你已經重構了,你已經分層了,你已經構建了,但是代碼在最后還是需要做一些事情的。有時候最好的做法是放棄,然后寫一大段垃圾代碼將剩余部分弄在一起。

業務邏輯是那種有着無盡的邊界情況和快速而骯臟的hack的代碼。這是沒問題的,我對此並不反對。其他的風格,如「游戲代碼」,或者「創始人代碼」,也是同一個東西:采用捷徑來節省大量的時間。

原因?有時候刪掉一個大的錯誤比刪掉18個小的交錯在一起的錯誤更為容易。大量的編程都是探索性的,犯幾次錯誤然后去迭代比想着一開始就做對更快速。

這個對於更有趣味或者更有創造性的嘗試來說更為正確。如果你正在寫你的第一個游戲:不要寫成一個游戲引擎。類似的,不要在寫好一個應用之前就去寫一個框架。第一次的時候盡管大膽的去寫一堆亂七八糟的代碼。你是不會知道怎樣拆分成模塊的,除非你是先知。

單一庫有類似的取舍:你事先不會知道怎樣拆分你的代碼,而一個大的錯誤顯然比20個緊密關聯的錯誤更容易處理。

當你知道哪些代碼將會被舍棄、刪除,或者替換的時候,你就可以采用更多的捷徑。特別是當你要寫一個一次性的客戶端網站,或關於一個活動的網頁的時候。或者任何一個有模版、要刪除復本、要填補框架所留下的缺口的地方。

我不是說你應該重復同一件事情十次來糾正錯誤。引用 Perlis 的話:「所有東西都應該從上到下建立,除了第一次的時候。」你應該在每一次嘗試時都去犯新的錯誤,接納新的風險,然后通過迭代慢慢的來完善。

成為一個專業的軟件開發者的過程就是不斷積累后悔和錯誤清單的過程。你從成功身上學不到任何東西。並不是你能知道好的代碼是什么樣的,而是你對壞的代碼記憶猶新。

項目不管怎樣最終都會失敗或者成為遺留代碼。失敗比成功更頻繁。寫十個大的泥球,看它們能將你帶向哪比嘗試去給一個糞球拋光更快速。

一次性刪掉所有的代碼比一段一段的去刪更容易。

階段6:把你的代碼拆分成小塊

大段的代碼是最容易寫的,但同時維護起來也最為昂貴。一個看起來很簡單的修改就會以特定的方式影響代碼庫的幾乎每個部分。本來作為一個整體刪除起來很簡單的東西,現在變得不可能去一段一段地刪除了。

就像我們根據相互獨立的任務來將我們的代碼分層一樣,從特定平台的代碼到特定領域的代碼,我們同樣需要找到一種方法來梳理出頂層邏輯。


從一系列很困難的或者很容易變的設計決定開始。然后去設計一個個模塊,讓每一個模塊都能隱藏一個設計上的決定,使其對其他決定不可見。 D. Parnas


我們根據代碼之間沒有共享的部分來拆分代碼,而不是將其拆分成有共同功能的模塊。我們把寫起來、維護起來,或者刪除起來最讓人沮喪的部分互相隔離開。

我們構建模塊不是為了復用,而是為了易於修改。

不幸的是,有些問題相比其他的問題而言分割起來更加困難和復雜。雖然單一責任原則說「每一個模塊都應該只去解決一個難題」,但更重要的是「每一個難題都只應該由一個模塊去解決」。

當一個模塊做兩件事情的時候,通常都是因為改變一部分需要另外一部分的改變。一個寫得很糟糕但是有着簡單接口的組件,通常比需要互相協調的兩個組件更容易使用。


我如今再也不會嘗試用「松耦合」這種速記一樣的描述來定義那種應該被認可與接受的材料了,或許我永遠不可能以清晰易懂的方式來定義它。但是當我看到它的時候我能夠認出來,而當前的代碼不屬於那種。 SCOTUS Justice Stewart


你如果可以在一個系統中刪除某一模塊而不用因此去重寫其他模塊的話,這個系統就通常被稱為是松耦合的。但是解釋松耦合是什么樣的比在一開始就建立一個這樣的系統要容易多了。

甚至於寫死一個變量 一次,或者使用命令行標記一個變量都可以叫松耦合。松耦合能讓你在改變想法的同時不需要改寫太多的代碼。

比如,微軟 Windows 的內部 API 和外部 API 就是因為這個目的而存在的。外部 API 與桌面程序的生命周期捆綁在一起,內部 API 則和內核捆綁在一起。隱藏這些 API 在給了微軟靈活性的同時又不會掛掉過多的軟件。

HTTP 中也有松耦合的例子:在你的 HTTP 服務器前設置一個緩存。將圖片移到 CDN 上,僅改變一下到它們的鏈接。這兩者都不會掛掉你的瀏覽器。

HTTP 的錯誤碼是另外一個關於松耦合的例子:服務器之間常見的問題都有自己獨特的錯誤碼。當你收到400的時候,再嘗試一次還是會得到同樣的結果。如果是500則可能會變。結果是,HTTP客戶端可以替代程序員處理許多的錯誤。

當把一個軟件分解成更小的部分時,必須要考慮到如何去處理錯誤。這件事說比做容易。


我勉強決定去使用LATEX。在有錯誤存在的情況下去實現可靠的分布式系統。 Armstrong, 2003


Erlang/OTP 在處理錯誤方面有獨到之處:監督樹(supervision trees)。大致來說,每一個 Erlang 進程都由一個監督進程發起並監視。當一個進程遇到了問題的時候,它就會退出。當進程退出的時候,其監督進程會將其重啟。

(這些監督進程由一個引導進程(bootstrap process)發起,當監督進程遇到錯誤的時候,引導進程會將其重啟)

其思想是,快速的失敗然后重啟比去處理錯誤要快。像這樣的錯誤處理看起來跟直覺相反 —— 當錯誤發生的時候通過放棄處理來獲得可靠性。但是重啟是解決暫時性錯誤的靈丹妙葯。

錯誤處理和恢復最好是在代碼的外層進行。這被稱為端對端(end-to-end)原則。端對端原則說在一個連接的遠端處理錯誤比在中間處理要更容易。即使在中間層進行處理,最終頂層的檢查也無法被省去。如果不管怎樣都需要在頂層來處理錯誤,那么為什么還要在里層去處理它們呢?

錯誤處理是一個系統可以緊密結合在一起的方式之一。除此之外還有許多其他緊耦合(tight coupling)的例子,但是要找一個糟糕的設計出來有一點不公平。除了 IMAP。

IMAP 中的每一個操作都像雪花一樣,都有自己獨特的選擇和處理。錯誤處理相當痛苦:錯誤可能因為其他操作產生的結果而半路殺出。

IMAP 使用獨特的令牌,而不是 UUID,來識別每一條信息。這些令牌也可能因為一個操作而中途被改變。許多操作都不是原子操作。找到一種可靠的方式將一封email從一個文件夾移動到另一個文件夾花費了25年時間。它還采用了一種特別的 UTF-7 編碼,和一種獨特的 base64 編碼。

以上這些都不是我編的。

相比而言,文件系統和數據庫是遠程儲存中好得多的例子。在文件系統中,操作的種類是固定的,但是卻有很多可操作的對象。

雖然 SQL 像是一個比文件系統要廣得多的接口,它仍然遵循相同的模式。若干對 set 的操作,許許多多對行的操作。雖然不能總是用一個數據庫去替換出另一個數據庫,但是找到可以和 SQL 一起使用的東西比找到任何一種自制的查詢語言都更容易。

其他松耦合的例子有具備中間件、過濾器(filter)和管道(pipeline)的系統。例如,Twitter Finagle 的服務都是使用共同的 API,這使得泛型的超時處理、重試機制,和身份驗證都能被毫不費力的加進客戶端和服務器端的代碼中。

(我很確定如果我不在這提UNIX管道的話,肯定會有人向我抱怨)

首先我們將我們的代碼分層,但現在其中的一些層要共享一個接口:一系列有着不同實現的相同行為和操作。好的松耦合通常就意味着一致的接口。

一個健康的代碼庫不一定要完美的呈現出模塊化。模塊化的部分使寫代碼變得很有趣,就像樂高玩具的趣味來自於它所有的零件都可以被拼在一起一樣。一個健康的代碼庫會有一些贅言和冗余,但它們使得可移植的組件間的距離恰到好處,因此你不會把自己套在里面。

松耦合的代碼不一定就是易於刪除的代碼,但是它們替代和修改起來都會容易得多。

階段7:持續的寫代碼

如果在寫新代碼的時候不需要去考慮舊有的代碼,那么測試新的想法就要容易很多。並不是說一定要寫小的模塊,避免龐大的程序,而是說你的系統在你正常開發的同時還需要能夠支持一兩個試驗。

功能發布控制(feature flag)是能讓你在以后改變主意的一種方法。雖然 feature flag 被視作一種測試不同功能的方法,但同時它能讓你在不重新部署的情況下就應用修改。

Google Chrome 是一個很好的例子,能說明其帶來的好處。他們發現維持固定發布周期最困難的就是要合並一個長期存在的功能分支的時候。

能夠在不需要重新編譯的情況下激活和關閉新的代碼,大的修改就可以在不影響現存代碼的情況下被分解為更小的合並。如果新功能在代碼庫中更早出現的話,當一個長期的功能開發影響到其他部分的時候就會表現得更加明顯。

Feature flag 並不是命令行開關,它是一種分離功能發布與合並分支,分離功能發布與代碼部署的方式。當軟件更新需要花費數小時、數天、甚至數周的時候,能夠在運行中改變功能就變得越來越重要了。隨便問一個運維人員,你就會知道任何一個可能在半夜把你叫起來的系統都值得在運行時去控制。

你更多的是要有一個反饋回路,而不是不停的迭代。模塊更多的是用來隔離不同組件以應對改變的,而不僅是用來做代碼復用的。處理代碼的更改不僅僅是開發新的功能,同時也是拋棄掉舊的功能。寫具有擴展性的代碼是寄希望於三個月后你能把所有事情都做對。寫可以被刪除的代碼則是基於相反的假設。

我在上文中談到的策略 —— 分層、隔離、共同的接口、構造 —— 並不是有關寫出優秀的軟件的,而是關於怎樣開發一個可以隨着時間而改變的軟件。


因此,管理上的問題不是要不要建一個試驗性的系統然后把它拋棄掉。你會這么做的。[……]所以做好拋棄它的打算吧;無論如何你都會的。 Fred Brooks


你不必要將它全部拋棄,但是你需要刪除某些部分。好的代碼並不是要第一次就做對一件事。好的代碼是那些不會造成障礙的遺留代碼(legacy code)。

好的代碼總是易於刪除的代碼。


programming is terriblelessons learned from a life wasted

2016-02-13

Write code that is easy to delete, not easy to extend.

“Every line of code is written without reason, maintained out of weakness, and deleted by chance” Jean-Paul Sartre’s Programming in ANSI C.

Every line of code written comes at a price: maintenance. To avoid paying for a lot of code, we build reusable software. The problem with code re-use is that it gets in the way of changing your mind later on.

The more consumers of an API you have, the more code you must rewrite to introduce changes. Similarly, the more you rely on an third-party api, the more you suffer when it changes. Managing how the code fits together, or which parts depend on others, is a significant problem in large scale systems, and it gets harder as your project grows older.

My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent” EWD 1036

If we see ‘lines of code’ as ‘lines spent’, then when we delete lines of code, we are lowering the cost of maintenance. Instead of building re-usable software, we should try to build disposable software.

I don’t need to tell you that deleting code is more fun than writing it.

To write code that’s easy to delete: repeat yourself to avoid creating dependencies, but don’t repeat yourself to manage them. Layer your code too: build simple-to-use APIs out of simpler-to-implement but clumsy-to-use parts. Split your code: isolate the hard-to-write and the likely-to-change parts from the rest of the code, and each other. Don’t hard code every choice, and maybe allow changing a few at runtime. Don’t try to do all of these things at the same time, and maybe don’t write so much code in the first place.


Step 0: Don’t write code

The number of lines of code doesn’t tell us much on its own, but the magnitude does 50, 500 5,000, 10,000, 25,000, etc. A million line monolith is going to be more annoying than a ten thousand line one and significantly more time, money, and effort to replace.

Although the more code you have the harder it is to get rid of, saving one line of code saves absolutely nothing on its own.

Even so, the easiest code to delete is the code you avoided writing in the first place.


Step 1: Copy-paste code

Building reusable code is something that’s easier to do in hindsight with a couple of examples of use in the code base, than foresight of ones you might want later. On the plus side, you’re probably re-using a lot of code already by just using the file-system, why worry that much? A little redundancy is healthy.

It’s good to copy-paste code a couple of times, rather than making a library function, just to get a handle on how it will be used. Once you make something a shared API, you make it harder to change.

The code that calls your function will rely on both the intentional and the unintentional behaviours of the implementation behind it. The programmers using your function will not rely on what you document, but what they observe.

It’s simpler to delete the code inside a function than it is to delete a function.


Step 2: Don’t copy paste code

When you’ve copy and pasted something enough times, maybe it’s time to pull it up to a function. This is the “save me from my standard library” stuff: the “open a config file and give me a hash table”, “delete this directory”. This includes functions without any state, or functions with a little bit of global knowledge like environment variables. The stuff that ends up in a file called “util”.

Aside: Make a util directory and keep different utilities in different files. A single utilfile will always grow until it is too big and yet too hard to split apart. Using a singleutil file is unhygienic.

The less specific the code is to your application or project, the easier they are to re-use and the less likely to change or be deleted. Library code like logging, or third party APIs, file handles, or processes. Other good examples of code you’re not going to delete are lists, hash tables, and other collections. Not because they often have very simple interfaces, but because they don’t grow in scope over time.

Instead of making code easy-to-delete, we are trying to keep the hard-to-delete parts as far away as possible from the easy-to-delete parts.


Step 3: Write more boilerplate

Despite writing libraries to avoid copy pasting, we often end up writing a lot more code through copy paste to use them, but we give it a different name: boilerplate. Boiler plate is a lot like copy-pasting, but you change some of the code in a different place each time, rather than the same bit over and over.

Like with copy paste, we are duplicating parts of code to avoid introducing dependencies, gain flexibility, and pay for it in verbosity.

Libraries that require boilerplate are often stuff like network protocols, wire formats, or parsing kits, stuff where it’s hard to interweave policy (what a program should do), and protocol (what a program can do) together without limiting the options. This code is hard to delete: it’s often a requirement for talking to another computer or handling different files, and the last thing we want to do is litter it with business logic.

This is not an exercise in code reuse: we’re trying keep the parts that change frequently, away from the parts that are relatively static. Minimising the dependencies or responsibilities of library code, even if we have to write boilerplate to use it.

You are writing more lines of code, but you are writing those lines of code in the easy-to-delete parts.


Step 4: Don’t write boilerplate

Boilerplate works best when libraries are expected to cater to all tastes, but sometimes there is just too much duplication. It’s time to wrap your flexible library with one that has opinions on policy, workflow, and state. Building simple-to-use APIs is about turning your boilerplate into a library.

This isn’t as uncommon as you might think: One of the most popular and beloved python http clients, requests, is a successful example of providing a simpler interface, powered by a more verbose-to-use library urllib3 underneath. requests caters to common workflows when using http, and hides many practical details from the user. Meanwhile, urllib3 does the pipelining, connection management, and does not hide anything from the user.

It is not so much that we are hiding detail when we wrap one library in another, but we are separating concerns: requests is about popular http adventures, urllib3 is about giving you the tools to choose your own adventure.

I’m not advocating you go out and create a /protocol/ and a /policy/ directory, but you do want to try and keep your util directory free of business logic, and build simpler-to-use libraries on top of simpler-to-implement ones. You don’t have to finish writing one library to start writing another atop.

It’s often good to wrap third party libraries too, even if they aren’t protocol-esque. You can build a library that suits your code, rather than lock in your choice across the project. Building a pleasant to use API and building an extensible API are often at odds with each other.

This split of concerns allows us to make some users happy without making things impossible for other users. Layering is easiest when you start with a good API, but writing a good API on top of a bad one is unpleasantly hard. Good APIs are designed with empathy for the programmers who will use it, and layering is realising we can’t please everyone at once.

Layering is less about writing code we can delete later, but making the hard to delete code pleasant to use (without contaminating it with business logic).


Step 5: Write a big lump of code

You’ve copy-pasted, you’ve refactored, you’ve layered, you’ve composed, but the code still has to do something at the end of the day. Sometimes it’s best just to give up and write a substantial amount of trashy code to hold the rest together.

Business logic is code characterised by a never ending series of edge cases and quick and dirty hacks. This is fine. I am ok with this. Other styles like ‘game code’, or ‘founder code’ are the same thing: cutting corners to save a considerable amount of time.

The reason? Sometimes it’s easier to delete one big mistake than try to delete 18 smaller interleaved mistakes. A lot of programming is exploratory, and it’s quicker to get it wrong a few times and iterate than think to get it right first time.

This is especially true of more fun or creative endeavours. If you’re writing your first game: don’t write an engine. Similarly, don’t write a web framework before writing an application. Go and write a mess the first time. Unless you’re psychic you won’t know how to split it up.

Monorepos are a similar tradeoff: You won’t know how to split up your code in advance, and frankly one large mistake is easier to deploy than 20 tightly coupled ones.

When you know what code is going to be abandoned soon, deleted, or easily replaced, you can cut a lot more corners. Especially if you make one-off client sites, event web pages. Anything where you have a template and stamp out copies, or where you fill in the gaps left by a framework.

I’m not suggesting you write the same ball of mud ten times over, perfecting your mistakes. To quote Perlis: “Everything should be built top-down, except the first time”. You should be trying to make new mistakes each time, take new risks, and slowly build up through iteration.

Becoming a professional software developer is accumulating a back-catalogue of regrets and mistakes. You learn nothing from success. It is not that you know what good code looks like, but the scars of bad code are fresh in your mind.

Projects either fail or become legacy code eventually anyway. Failure happens more than success. It’s quicker to write ten big balls of mud and see where it gets you than try to polish a single turd.

It’s easier to delete all of the code than to delete it piecewise.


Step 6: Break your code into pieces

Big balls of mud are the easiest to build but the most expensive to maintain. What feels like a simple change ends up touching almost every part of the code base in an ad-hoc fashion. What was easy to delete as a whole is now impossible to delete piecewise.

In the same we have layered our code to separate responsibilities, from platform specific to domain specific, we need to find a means to tease apart the logic atop.

[Start] with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others. D. Parnas

Instead of breaking code into parts with common functionality, we break code apart by what it does not share with the rest. We isolate the most frustrating parts to write, maintain, or delete away from each other.

We are not building modules around being able to re-use them, but being able to change them.

Unfortunately, some problems are more intertwined and hard to separate than others. Although the single responsibility principle suggests that ‘each module should only handle one hard problem’, it is more important that ‘each hard problem is only handled by one module’

When a module does two things, it is usually because changing one part requires changing the other. It is often easier to have one awful component with a simple interface, than two components requiring a careful co-ordination between them.

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description [”loose coupling”], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the code base involved in this case is not that. SCOTUS Justice Stewart

A system where you can delete parts without rewriting others is often called loosely coupled, but it’s a lot easier to explain what one looks like rather than how to build it in the first place.

Even hardcoding a variable once can be loose coupling, or using a command line flag over a variable. Loose coupling is about being able to change your mind without changing too much code.

For example, Microsoft Windows has internal and external APIs for this very purpose. The external APIs are tied to the lifecycle of desktop programs, and the internal API is tied to the underlying kernel. Hiding these APIs away gives Microsoft flexibility without breaking too much software in the process.

HTTP has examples of loose coupling too: Putting a cache in front of your HTTP server. Moving your images to a CDN and just changing the links to them. Neither breaks the browser.

HTTP’s error codes are another example of loose coupling: common problems across web servers have unique codes. When you get a 400 error, doing it again will get the same result. A 500 may change. As a result, HTTP clients can handle many errors on the programmers behalf.

How your software handles failure must be taken into account when decomposing it into smaller pieces. Doing so is easier said than done.

I have decided, reluctantly to use LATEXMaking reliable distributed systems in the presence of software errors. Armstrong, 2003

Erlang/OTP is relatively unique in how it chooses to handle failure: supervision trees. Roughly, each process in an Erlang system is started by and watched by a supervisor. When a process encounters a problem, it exits. When a process exits, it is restarted by the supervisor.

(These supervisors are started by a bootstrap process, and when a supervisor encounters a fault, it is restarted by the bootstrap process)

The key idea is that it is quicker to fail-fast and restart than it is to handle errors. Error handling like this may seem counter-intuitive, gaining reliability by giving up when errors happen, but turning things off-and-on again has a knack for suppressing transient faults.

Error handling, and recovery are best done at the outer layers of your code base. This is known as the end-to-end principle. The end-to-end principle argues that it is easier to handle failure at the far ends of a connection than anywhere in the middle. If you have any handling inside, you still have to do the final top level check. If every layer atop must handle errors, so why bother handling them on the inside?

Error handling is one of the many ways in which a system can be tightly bound together. Thre are many other examples of tight coupling, but it is a little unfair to single one out as being badly designed. Except for IMAP.

In IMAP almost every each operation is a snowflake, with unique options and handling. Error handling is painful: errors can come halfway through the result of another operation.

Instead of UUIDs, IMAP generates unique tokens to identify each message. These can change halfway through the result of an operation too. Many operations are not atomic. It took more than 25 years to get a way to move email from one folder to another that reliably works. There is a special UTF-7 encoding, and a unique base64 encoding too.

I am not making any of this up.

By comparison, both file systems and databases make much better examples of remote storage. With a file system, you have a fixed set of operations, but a multitude of objects you can operate on.

Although SQL may seem like a much broader interface than a filesystem, it follows the same pattern. A number of operations on sets, and a multitude of rows to operate on. Although you can’t always swap out one database for another, it is easier to find something that works with SQL over any homebrew query language.

Other examples of loose coupling are other systems with middleware, or filters and pipelines. For example, Twitter’s Finagle uses a common API for services, and this allows generic timeout handling, retry mechanisms, and authentication checks to be added effortlessly to client and server code.

(I’m sure if I didn’t mention the UNIX pipeline here someone would complain at me)

First we layered our code, but now some of those layers share an interface: a common set of behaviours and operations with a variety of implementations. Good examples of loose coupling are often examples of uniform interfaces.

A healthy code base doesn’t have to be perfectly modular. The modular bit makes it way more fun to write code, in the same way that Lego bricks are fun because they all fit together. A healthy code base has some verbosity, some redundancy, and just enough distance between the moving parts so you won’t trap your hands inside.

Code that is loosely coupled isn’t necessarily easy-to-delete, but it is much easier to replace, and much easier to change too.


Step 7: Keep writing code

Being able to write new code without dealing with old code makes it far easier to experiment with new ideas. It isn’t so much that you should write microservices and not monoliths, but your system should be capable of supporting one or two experiments atop while you work out what you’re doing.

Feature flags are one way to change your mind later. Although feature flags are seen as ways to experiment with features, they allow you to deploy changes without re-deploying your software.

Google Chrome is a spectacular example of the benefits they bring. They found that the hardest part of keeping a regular release cycle, was the time it took to merge long lived feature branches in.

By being able to turn the new code on-and-off without recompiling, larger changes could be broken down into smaller merges without impacting existing code. With new features appearing earlier in the same code base, it made it more obvious when long running feature developement would impact other parts of the code.

A feature flag isn’t just a command line switch, it’s a way of decoupling feature releases from merging branches, and decoupling feature releases from deploying code. Being able to change your mind at runtime becomes increasingly important when it can take hours, days, or weeks to roll out new software. Ask any SRE: Any system that can wake you up at night is one worth being able to control at runtime.

It isn’t so much that you’re iterating, but you have a feedback loop. It is not so much you are building modules to re-use, but isolating components for change. Handling change is not just developing new features but getting rid of old ones too. Writing extensible code is hoping that in three months time, you got everything right. Writing code you can delete is working on the opposite assumption.

The strategies i’ve talked about — layering, isolation, common interfaces, composition — are not about writing good software, but how to build software that can change over time.

The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. […] Hence plan to throw one away; you will, anyhow. Fred Brooks

You don’t need to throw it all away but you will need to delete some of it. Good code isn’t about getting it right the first time. Good code is just legacy code that doesn’t get in the way.

Good code is easy to delete.




Acknowledgments

Thank you to all of my proof readers for your time, patience, and effort.

Further Reading

Layering/Decomposition

On the Criteria To Be Used in Decomposing Systems into Modules, D.L. Parnas.

How To Design A Good API and Why it Matters, J. Bloch.

The Little Manual of API Design, J. Blanchette.

Python for Humans, K. Reitz.


Common Interfaces

The Design of the MH Mail System, a Rand technical report.

The Styx Architecture for Distributed Systems

Your Server as a Function, M. Eriksen.


Feedback loops/Operations lifecycle

Chrome Release Cycle, A. Laforge.

Why Do Computers Stop and What Can Be Done About It?, J. Gray.

How Complex Systems Fail, R. I. Cook.


The technical is social before it is technical.

All Late Projects Are the SameSoftware Engineering: An Idea Whose Time Has Come and Gone?, T. DeMarco.

Epigrams in Programming, A. Perlis.

How Do Committees Invent?, M.E. Conway.

The Tyranny of Structurelessness, J. Freeman



注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
粤ICP备14056181号  © 2014-2020 ITdaan.com