Show
Today, weâre going to do a technical deep-dive into how Tor really works. No mention of how to access Tor, no mention of what might be on Tor. This is how Tor works. Without speculation and without exaggeration of what Tor is. Just a deep-dive into the technical stuff of how Tor works. This article is designed to be read by anyone, with **ZERO **knowledge on networking or Tor. Letâs dive right in. What is Tor? đ»#The United States Naval Research Laboratory developed The Onion Routing Protocol (Tor) to project U.S. intelligence communications online. Ironically, Tor has seen widespread use by everyone - even those organisations which the U.S. Navy fights against. You may know Tor as the hometown of online illegal activities, a place where you can buy any drug you want, a place for all things illegal. Tor is much larger than what the media makes it out to be. According to Kings College much of Tor is legal. When you normally visit a website, your computer makes a direct TCP connection with the websiteâs server. Anyone monitoring your internet could read the TCP packet. They can find out what website youâre visiting and your IP address. As well as what port youâre connecting to. If youâre using HTTPS, no one will know what the message said. But, sometimes all an adversary needs to know is who youâre connecting to. Using Tor, your computer never communicates with the server directly. Tor creates a twisted path through 3 Tor nodes, and sends the data via that circuit. The core principle of Tor is onion routing which is a technique for anonymous & secure communication over a public network. In onion routing messages are encapsulated in several layers of encryption.
So does a message going through Tor. Each layer in Tor is encryption, you are adding layers of encryption to a Tor message, as opposed to just adding 1 layer of encryption. This is why itâs called The Onion Routing Protocol, because it adds layers at each stage. The resulting onion (fully encapsulated message) is then transmitted through a series of computers in a network (called onion routers) with each computer peeling away a layer of the âonionâ. This series of computers is called a path. Each layer contains the next destination - the next router the packet has to go to. When the final layer is decrypted you get the plaintext (non-encrypted message). The original author remains anonymous because each node in the network is only aware of the preceding and following nodes in the path (except the first node that does know who the sender is, but doesnât know the final destination). This has led to attacks where large organisations with expansive resources run servers to attempt to be the first and last nodes in the network. If the organisationâs server is the first node, it knows who sent the message. If the organisation server is the last node, it knows the final destination and what the message says. Now we have a basic overview of Tor, letâs start exploring how each part of Tor works. Donât worry if youâre confused, every part of Tor will be explained using gnarly diagrams đâš Overview đ#
Clients choose a *path *through the network and build a circuit where each onion router in the path knows the predecessor and the successor, but no other nodes in the circuit. Paths and circuits are synonyms. The original author (the question mark on the far left) remains anonymous, unless youâre the first path in the node as you know who sent you the packet. No one knows what data is being sent until it reaches the last node in the path; who knows the data but doesnât know who sent it. The second to last node in the path doesnât know what the data is, only the last node in the path does. This has led to attacks whereby large organisations with expansive resources create Tor servers which aim to be the first and last onion routers in a path. If the organisation can do this, they get to know who sent the data and what data was sent, effectively breaking Tor. Oh no! Now large organisation knows you watch Netflix đż Itâs incredibly hard to do this without being physically close to the location of the organisations servers, weâll explore this more later. Throughout this article Iâll be using Netflix as a normal service (Bob) and Amazon Prime Video as the adversary (Eve). In the real world, this is incredibly unlikely to be the case. Iâm not here to speculate on what organisations might want to attack Tor, so Iâve used 2 unlikely examples to avoid the political side of it. Each packet flows down the network in fixed-size cells. These cells have to be the same size so none of the data going through the Tor network looks suspiciously big. These cells are unwrapped by a symmetric key at each router and then the cell is relayed further down the path. Letâs go into Tor itself. Tor Itself đ#
Tor needs a lot of users to create anonymity, if Tor was hard to use new users wouldnât adopt it so quickly. Because new users wonât adopt it, Tor becomes less anonymous. By this reasoning it is easy to see that usability isnât just a design choice of Tor but a security requirement to make Tor more secure. If Tor isnât usable or designed nicely, it wonât be used by many people. If itâs not used by many people, itâs less anonymous. Tor has had to make some design choices that may not improve security but improve usability with the hopes that an improvement in usability is an improvement in security. What Tor Isnât â#Tor is not a completely decentralised peer-to-peer system like many people believe it to be. If it was completely peer to peer it wouldnât be very usable. Tor requires a set of directory servers that manage and keep the state of the network at any given time. Tor is not secure against end to end attacks. An end to end attack is where an entity has control of both the first and last node in a path, as talked about earlier. This is a problem that cyber security experts have yet to solve, so Tor does not have a solution to this problem. Tor does not hide the identity of the sender. In 2013 during the Final Exams period at Harvard a student tried to delay the exam by sending in a fake bomb threat. The student used Tor and Guerrilla Mail (a service which allows people to make disposable email addresses) to send the bomb threat to school officials. The student was caught, even though he took precautions to make sure he wasnât caught. Gurillar mail sends an originating IP address header along with the email thatâs sent so the receiver knows where the original email came from. With Tor, the student expected the IP address to be scrambled but the authorities knew it came from a Tor exit node (Tor keeps a list of all nodes in the directory service) so the authorities simply looked for people who were accessing Tor (within the university) at the time the email was sent. Tor isnât an anonymising service, but it is a service that can encrypt all traffic from A to B (so long as an end-end attack isnât performed). Tor is also incredibly slow, so using it for Netflix isnât a good use case. The Difference Between Tor and a VPN đ»#When you use a VPN, the VPN forwards all your internet traffic to the appropriate destination. When it does so, the VPN encrypts your traffic. All your internet service provider can see is encrypted traffic heading from your computer to the VPN. They canât see inside your packets. They donât know who youâre talking to - other than the VPN. VPNâs arenât private in the same way that Tor is. VPNs protect you against ISPs or local adversaries (ones monitoring your laptopâs WiFi). But, they donât protect you from themselves. The VPN is the man in the middle. It knows who you are and who youâre talking to. Depending on the traffic, the VPN also decrypts your packet. Meaning they know everything. With a VPN, you have to trust it. With Tor, you donât have to put a lot of trust in. In Tor, one rogue node is survivable. If one of the nodes in our graph earlier was an adversary, theyâll only know our IP address or our data packet. Tor protects you from Tor. VPNâs expect that you trust them. Tor protects you from the Tor network. One rogue node is survivable. They donât expect you to trust the network. No one, apart from you, should know the IP addresses of the origin and destination - and know the contents of the message. Now that we have a good handle on what Tor is, letâs explore onion routing. Onion Routing đ#Given the network above, we are going to simulate what Tor does. Your computer is the one on the far left, and youâre sending a request to watch *Stranger Things *on Netflix (because what else is Tor used for đ). This path of nodes is called a *circuit. *Later on, weâre going to look into how circuits are made and how the encryption works. But for now weâre trying to generalise how Tor works. We start off with the message (we havenât sent it yet). We need to encrypt the message N times (where N is how many nodes are in the path). We encrypt it using AES, a symmetric key crypto-system. The key is agreed using Diffie-Hellman. Donât worry, weâll discuss all of this later. There is 4 nodes in the path (minus your computer and Netflix) so we encrypt the message 4 times. Our packet (onion) has 4 layers. Blue, purple, orange, and teal. Each colour represents one layer of encryption. We send the onion to the first node in our path. That node then removes the first layer of encryption. Each node in the path knows what the key to decrypt their layer is (via Diffie-Hellman). Node 1 removes the blue layer with their symmetric key (that you both agreed on). Node 1 knows you sent the message, but the message is still encrypted by 3 layers of encryption, it has no idea what the message is. As it travels down the path, more and more layers are stripped away. The next node does not know who sent the packet. All it knows is that Node 1 sent them the packet, and itâs to be delivered to Node 3. Now Node 3 strips away a layer. The final node knows what the message is and where itâs going, but it doesnât know who sent it. All it knows is that Node 3 sent them the message, but it doesnât know about anyone else in the path. One of the key properties here is that once a node decrypts a layer, it cannot tell how many more layers there are to decrypt. It could be as small as 1 or 2 or as large as 200 layers of encryption. Now thereâs no way Amazon can find out you watch Netflix! Netflix sends back a part of Stranger Things. Letâs see how it works in reverse. Node 4 adds its layer of encryption now. It doesnât know who originally made the request, all it knows is that Node 3 sent the request to them so it sends the response message back to Node 3. And so on for the next few nodes. Now the response packet is fully encrypted. Now the packet is fully encrypted, the only one who still knows what the message contains is Node 4. The only one who knows who made the message is Node 1. Now that we have the fully encrypted response back, we can use all the symmetric keys to decrypt it. You might be thinking âIâve seen snails đ faster than thisâ and you would be right. This protocol isnât designed for speed, but at the same time it has to care about speed. The algorithm could be much slower, but much more secure (using entirely public key cryptography instead of symmetric key cryptography) but the usability of the system matters. So yes, itâs slow. No itâs not as slow as it could be. But itâs all a balancing act here. The encryption used is normally AES with the key being shared via Diffie-Hellman. The paths Tor creates are called circuits. Letâs explore how Tor chooses what nodes to use in a circuit. How Is a Circuit Created? đ»#Each machine, when it wants to create a circuit, chooses the exit node first, followed by the other nodes in the circuit. Tor circuits are always 3 nodes. Increasing the length of the circuit does not create better anonymity. If an attacker owns the first and last nodes in the network, you can have 1500 nodes in the circuit and it still wouldnât make you more secure. When Tor selects the exit node, it selects it following these principles:
All paths in the circuit obey these rules:
If you choose the same node twice, itâs guaranteed that the node will either be the guard node (the node you enter at) or the exit node, both dangerous positions. There is a 2/3 chance of it being both the guard and exit nodes, which is even more dangerous. We want to avoid the entry / exit attacks. This isn't okay. Node colour changes to show it's the same.
Operators who run more than 1 Tor node can choose to signify their nodes as âfamilyâ. This means that the nodes have all the same parent (the operator of their network). This is again a countermeasure against the entry / exit attacks, although operators do not have to declare family if they wish. If they want to become a guard node (discussed soon) it is recommended to declare family, although not required. Not allowed.
Subnets define networks. IP addresses are made up of 8 octets of bits. As an example, Googleâs IP address in binary is: 01000000.11101001.10101001.01101010The first 16 bits (the /16 subnet) is 01000000.11101001 which means that Tor does not choose any nodes which start with the same 16 bits as this IP address. Again, a counter-measure to the entry / exit attacks. Not allowed.If subnets sound confusing, Iâve written this Python code to help explain them: # ip addresses are in binary, not the usual base 10 subnets are usually powers of 2, this is 2^4. IP = "01000000.11101001.10101001.01101010" subnet = 16 # this will store the subnet address once we find it subnet_ip = [] IP_list = list(IP) counter = 0 for i in IP_list: # we want to end the loop when we reach the subnet number if counter >= subnet: break # the ip address segments each oclet of bits with full stops # we don't want to count a fullstop as a number # but we want to include it in the final subnet if i == ".": subnet_ip.append(".") continue else: # else it is a number so we append and increment counter subnet_ip.append(i) counter = counter + 1 print("Subnet is " + ''.join(subnet_ip))
Non-running means the node currently isnât online. You donât want to pick things that arenât online. Non-valid means that some configuration in the nodes torrc is wrong. You donât want to accept strange configurations in case they are trying to hack or break something.
A guard node is a privileged node because it sees the real IP of the user. Itâs âexpensiveâ to become a guard node (maintain a high uptime for weeks and have good bandwidth). This is possible for large companies who have 99.9% uptime and high bandwidth (such as Netflix). Tor has no way to stop a powerful adversary from registering a load of guard nodes. Right now, Tor is configured to stick with a single guard node for 12 weeks at a time, so you choose 4 new guard nodes a year. This means that if you use Tor once to watch Amazon Prime Video, it is relatively unlikely for Netflix to be your guard node. Of course, the more guard nodes Netflix creates the more likely it is. Although, if Netflix knows you are connecting to the Tor network to watch Amazon Prime Video then they will have to wait 4 weeks for their suspicions to be confirmed, unless they attack the guard node and take it over. Becoming a guard node is relatively easy for a large organisation. Becoming the exit node is slightly harder, but still possible. We have to assume that the large organisation has infinite computational power to be able to do this. The solution is to make the attack highly expensive with a low rate of success. The more regular users of Tor, the harder is if for a large organisation to attack it. If Netflix controls $\frac{50}{100}$ nodes in the network: The chance of you choosing a guard node from Netflix is 50%. If suddenly 50 more normal user nodes join then thatâs $\frac{50}{150}$, reducing the probability of Netflix owning a guard node (and thus, a potential attack) and making it even more expensive. There is strength in numbers within the Tor service. Guard Pinning đ#When a Tor client starts up for the first time, it chooses a small & random set of guard nodes. For the next few months, it makes sure each circuit is using one of these pre-selected nodes as its guard node. The official proposal from the Tor documentation states: 1. Introduction and motivation Tor uses entry guards to prevent an attacker who controls some a fraction of the network from observing a fraction of every user's traffic. If users chose their entries and exits uniformly at random from the list of servers every time they build a circuit, then an adversary who had (k/N) of the network would deanonymize F=(k/N)^2 of all circuits... and after a given user had built C circuits, the attacker would see them at least once with probability 1-(1-F)^C. With large C, the attacker would get a sample of every user's traffic with probability 1. To prevent this from happening, Tor clients choose a small number of guard nodes (currently 3). These guard nodes are the only nodes that the client will connect to directly. If they are not compromised, the user's paths are not compromised. But attacks remain. Consider an attacker who can run a firewall between a target user and the Tor network, and make many of the guards they don't control appear to be unreachable. Or consider an attacker who can identify a user's guards, and mount denial-of-service attacks on them until the user picks a guard that the attacker controls.Guard node pinning is important because of Torâs threat model. Tor assumes that it may only take a single opening for an adversary to work out who you are talking to, or who you are. Since a single vulnerability circuit can destroy your integrity, Tor tries to minimise the probability that we will ever construct one or more vulnerable circuits. Tor guard nodes can be DOSâd, or an attacker could have a majority share of guard nodes on the internet when you connect to try and get you. By guard node pinning, it aims to make this much harder. In the event of an attacker working out your guard nodes and shutting them down, forcing you to connect to their guard nodes. Or, you connect to a guard node controlled by an adversary Tor has algorithms in place to try and detect this. Outined here. What Is a Directory Node? đ#The state of the Tor network is tracked and publicised by a group of 9 trusted servers (as of 2019) known as directory nodes. Each of which is controlled by a different organisation. Each node is a seperate organisation because it provides redundancy and distributes trust. The integrity of the Tor network relies on the honesty and correctness of the directory nodes. So making the network resilient and distributing trust is critical. Directory nodes maintain a list of currently running relays (publicly listed node in the Tor network). Once per hour directory nodes publish a consensus together. The consensus is a single document compiled and voted on by each directory node. It ensures that all clients have the same information about the relays that make up Tor. When a Tor user (a client or a node) wants to know the current state of the network, it asks a directory node. As weâll see later, directory nodes are essential for all parts of Tor, especially in hidden services. Relays keep the directory nodes up to date. They send directory node(s) a notification whenever they come online or updated. Whenever a directory node receives a notification, it updates its personal opinion on the current state of the Tor network. All directory nodes then use this opinion to form a consensus of the network. Letâs now look at what happens when disagreements arise in the directory services when forming a consensus. The first version of Tor took a simple approach to conflict resolution. Each directory node gave the state of the network as it personally saw it. Each client believed whichever directory node it had spoken to recently. There is no consensus here among all directory nodes. In Tor, this is a disaster. There was nothing ensuring that directory nodes were telling the truth. If an adversary took over one directory node, they would be able to lie about the state of the network. If a client asked this adversary controlled directory for the state of the network, itâd return a list. This list contains only nodes that the adversary controlled. The client would then connect to these adversary nodes. The second version of the Tor directory system made this attack harder. Instead of asking a single directory node for its opinion, clients asked every directory node and combined their opinions into a consensus. But, clients could form differing views on the network depending on when they had last spoken to each directory node. This gave way to statistical information leakage - not as bad as Tor 1.0. Besides, every client had to talk to every directory node, which took time and was expensive. The third and current version of the directory system moved the responsibility of calculating a consensus from clients to directory nodes. What Are Bridge Nodes? đ#Iâm not sure if you saw it earlier, but I made the distinction between nodes in the directory services and nodes that arenât. If a repressive state wants to block Tor, it uses the directory nodes. Directory nodes keep up-to-date lists of Tor relay nodes and are publicly available for anyone to download. The state can query a directory node for a list of active Tor relays, and censor all traffic to them. Tor keeps an up-to-date listing of countries where it is possibly blocked (censored) if youâre interested. Tor helps its users circumvent the censorship by hiding the fact they are using Tor. They do this through a proxy known as a Bridge Node. Tor users send their traffic to the bridge node, which forwards the traffic onto the userâs chosen guard nodes. The full list of Bridge nodes is never published, making it difficult for states to completely block Tor. You can view some bridge nodes here. If this doesnât work, Tor suggests:
Itâs possible to block Tor another way. Censoring states can use Deep Packet Inspection (DPI)to analyse the shape, volume, and feel of each packet. Using DPI states can recognise Tor traffic, even when they connect to unknown IP addresses or are encrypted. To circumvent this, Tor developers have made Pluggable Transports (PT). These transform Tor traffic flow between the client and the bridge. In the words of Torâs documentation: This way, censors who monitor traffic between the client and the bridge will see innocent-looking transformed traffic instead of the actual Tor traffic. External programs can talk to Tor clients and Tor bridges using the pluggable transport API, to make it easier to build interoperable programs. Ever heard those rumours âthere are websites on the dark-web, on Tor that when you visit them youâll see people doing nasty things, selling illegal things or worse: watching The Hangover Part 3â When people talk about these websites they are talking about Tor Hidden Services. These are a wild concept and honestly deserve an entire blogpost on their own. Hidden services are servers, like any normal computer server. Except in a Tor Hidden Service it is possible to communicate without the user and server knowing who each other are. The device (the question mark) knows that it wants to access Netflix, but it doesnât know anything about the server and the server doesnât know anything about the device thatâs asked to access it. This is quite confusing, but donât worry, Iâm going to explain it all with cool diagrams. âš When a server is set up on Tor to act as a hidden service, the server sends a message to some selected Onion Routers asking if they want to be an introduction point to the server. It is entirely up to the server as to who gets chosen as an introduction point, although usually they ask 3 routers to be their introduction points. The introduction points know that they are going to be introducing people to the server. The server will then create something called a hidden service descriptor which has a public key and the IP address of each introduction point. It will then send this hidden service descriptor to a distributed hash table which means that every onion router (not just the introduction points) will hold some part of the information of the hidden service. If you try to look up a hidden service the introduction point responsible for it will give you the full hidden service descriptor, the address of the hidden serviceâs introduction points. The key for this hash table is the onion address and the onion address is derived from the public key of the server. The idea is that the onion address isnât publicised over the whole Tor network but instead you find it another way like from a friend telling you or on the internet (addresses ending in .onion). The way that the distributed hash table is programmed means that the vast majority of the nodes wonât know what the descriptor is for a given key. So almost every single onion router will have minimal knowledge about the hidden service unless they explicitly want to find it. Letâs say someone gave you the onion address. You request the descriptor off the hash table and you get back the services introduction points. If you want to access an onion address you would first request the descriptor from the hash table and the descriptor has, letâs say 4 or 5 IP addresses of introductory nodes. You pick one at random letâs say the top one. Youâre going to ask the introduction point to introduce you to the server and instead of making a connection directly to the server you make a rendezvous point at random in the network from a given set of Onion Routers. This should say âTor nodeâ. Iâve lost the files for these graphs (thanks LucidChart). Terribly sorry I canât update this. You then make a circuit to that rendezvous point and you send a message to the rendezvous point asking if it can introduce you to the server using the introduction point you just used. You then send the rendezvous point a one time password (in this example, letâs use âLabradorâ). The rendezvous point makes a circuit to the introduction point and sends it the word âLabradorâ and its IP address. The introduction point sends the message to the server and the server can choose to accept it or do nothing. If the server accepts the message it will then create a circuit to the rendezvous point. The server sends the rendezvous point a message. The rendezvous point looks at both messages from your computer and the server. It says âwell, Iâve received a message from this computer saying it wants to connect with this service and Iâve also received a message from the service asking if it can connect to a computer, therefore they must want to talk to each otherâ. The rendezvous point will then act as another hop on the circuit and connect them. In short, a hidden service works like this, taken from here:
Tor projects its users from analysis attacks. The adversary wants to know who Alice is talking to. Yet, Tor does not protect against confirmation attacks. In these attacks, the adversary aims to answers the question âIs Alice talking to Bob?â Confirmation attacks are hard and need a lot of preparation and resources. The attacker needs to be able to track both ends of the circuit. The attacker can either directly track each devices internet connection or the guard and exit nodes. If Alice sends a packet like this: # (timestamp, size, port, protocol) (17284812, 3, 21, SSH) And Bob receives this packet, the attacker can see that the packets are the same - even though the attacker cannot see what the packet is as it is encrypted. Does Bob tend to receive packets at the same time that Alice sends them? Are they the same size? If so, it is reasonable to infer that Alice and Bob are communicating with each other. Tor breaks packets up into sizeable chunks for a reason - to try and prevent this kind of thing. Tor is working on padding all packets to make this harder. Theyâre discussing adding packet order randomisation too. But this is too costly at the moment. The Tor browser does add some extra defences, such as reordering packets. If Alice sends the packets, A, B, C and Bob receives them in B, A, C it is harder to detect that they are the same. Itâs not foolproof, but it does become harder. An attack where the attacker tries to control both ends of the circuit is called a Sylbil Attack. Named after the main character of the book Sybil by Flora Rheta Schreiber. We discussed some of this earlier, where an attacker controls both the guard & exit nodes. Sybil attacks are not theoretical. In 2014 researchers at Carnegie Mellon University appeared to successfully carry out a Sybil Attack against the real-life Tor network. When Lizard Squad - a group of hackers tried to perform a Sybil attack, a detection system alarmed. Tor has built-in monitoring against these kinds of events, and they are working on more sophisticated monitoring against Sybil attacks. In 2007 Dan Egerstad - a Swedish security consultant, revealed he has intercepted usernames and passwords sent through Tor by being an exit node. At the time, these were not TLS or SSL encrypted. Interestingly, Dan Egerstad had this to say on the Tor nodes:
Tor does not normally hide the fact that you are using Tor. Many websites (such as BBCâS iPlayer or editing Wikipedia) block you when using a known Tor node. Some applications, under Tor, reveal your true IP address. One such application is BitTorrent. Jansen et al described an attack where they DDOS exit nodes. By degrading the network (removing exit nodes) an attacker increases the chance to getting an exit node. Tor users who visit a site twice, once on Tor and once off, can be tracked. The way you move your mouse is unique. There is a JavaScript time measurement bug report on the Tor project that shows how itâs possible to monitor the mouse locations on a site (even when on Tor). Once you fingerprint someone twice, you know theyâre the same person. It should be noted, that Tor browser offers 3 levels of security (located in the settings). The highest security level disables JavaScript, some images (as they can be used to track you) and some fonts too. The lesson is, if you want high-security Tor, use the high-security version. Now, all these attacks sound cool. But thatâs not how most Tor users are caught. Most Tor users make mistakes and are caught because of themselves. Take Dredd pirate Roberts, Founder of the Silk Road dark marketplace. He gave himself away by posting about it on social media. Most Tor users are caught (if theyâre doing illegal things) by bad operational security, and not normally because of a security issue with Tor. Itâs worth repeating this story, that we saw earlier. In 2013 during the Final Exams period at Harvard, a student tried to delay the exam by sending in a fake bomb threat. The student used Tor and Guerrilla Mail (a service which allows people to make disposable email addresses) to send the bomb threat to school officials. The student was caught, even though he took precautions to make sure he wasnât caught. Guerilla mail sends an originating IP address header along with the email thatâs sent to the receiver, so it knows where the original email came from. With Tor, the student expected the IP address to be scrambled but the authorities knew it came from a Tor exit node (Tor keeps a list of all nodes in the directory service) so the authorities looked for people who were accessing Tor (within the university) at the time the email was sent. If this person went to a coffee shop or something, he probably would of be fine. Thereâs a fantastic talk at DEFCON 22 about how Tor users got caught. None of the stories mentioned was caused by Tor, but rather bad OpSec. {{ youtube eQ2OZKitRwc }} Conclusion đ#Tor is a fascinating protocol full of algorithms that have been refined over the years. Iâve come to appreciate Tor, and I hope you have to. Here is a list of things weâve covered:
If you want to learn more, check out the paper on Tor titled âTor: The Second-Generation Onion Router". If you liked this article and want more like it, sign up to my email list below âš Iâll only send you an email when I have something new, which is every month / 2 months or so. |