Date: Fri, 30 Dec 2016 18:37:47 +0100 From: Domagoj Stolfa <domagoj.stolfa@gmail.com> To: freebsd-dtrace@freebsd.org Subject: RFC: Changes in DTrace to allow for distributed operation Message-ID: <20161230173747.GB46006@freebsd-laptop>
next in thread | raw e-mail | index | archive | help
--mojUlQ0s9EVzWg2t Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello, I have been working on extending DTrace to allow for a natural way of traci= ng in a distributed environment. This would consist of being able to trace events= on different virtual machines, remote servers with access, cluster nodes and s= o on. I will summarize the changes I have made and have thought of making, outly = all the design tradeoffs, flaws and merits of each design tradeoff I have thoug= ht of making in hopes of getting feedback from others interested in distributed tracing. The following abbreviations will be used: instance -> Operating system instance, running on a VM or bare metal. UUIDv1 -> Universally unique identifier version 1 as per RFC4122 UUIDv5 -> Universally unique identified version 5 as per RFC4122 host -> the DTrace instance running on the machine that issued the DTrace script. DDAG -> Distributed directed acyclic graph =20 Starting off with an added struct in the kernel as a part of the DTrace framework: typedef struct dtrace_instance { char *dtis_name; struct dtrace_provider *dtis_provhead; struct dtrace_instance *dtis_next; struct dtrace_instance *dtis_prev; } dtrace_instance_t; where: dtis_name -> instance name dtis_provhead -> first provider in the instance dtis_next, dtis_prev -> doubly linked list nodes - Each instance is identified by it's name, which implies that once an inst= ance with a given name is created, all other instances with that name will be identified equally on the host. - Each new instance is added at the start of the list and becomes the new l= ist head. Merits: - The instances being identified by their name allows for an easy transition between the framework and the scripts one would be writing. - There is no redundancy in the list, which allows for both less memory bei= ng used, less indirections in traversing and looking up probes in the hash in order to identify which instance they belong to. Flaws: - This does not identify the instance that fired the probe in an unique way= =2E In order to get this information the provider needs to be known(however, thi= s is known from the dtrace_probe struct). The problem with this approach comes= when we want to send the appropriate information on level up(towards the host)= =2E What needs to be sent is the probe ID, which then needs to be mapped to the appropriate ID on the host.=20 - Using just the instance name is not enough to identify which instance the provider/probe belongs to. Possible resolution: - A probe ID could be sent over to the host with the change in the DTrace framework being made so that dtrace_probes array is no longer kept global= ly. Instead, it would be kept in the dtrace_instance struct. This would allow= to easily identify the instance where the probe needs to be fired, and would eliminate the need for the additional hash table. - In order to be able to identify the instance that the provider belongs to= , a UUID could be kept in the way that will further be explained. Additionall= y, the dtpv_next pointer could be used differently in such a way that it is = no longer a list of providers, but a list of providers in an instance. This = could be accomplished by keeping a list of providers of each instance in the dtrace_instance struct, or alternatively, implementing the semantics of t= he provider list differently, so that it can easily be identified which prov= iders belong to which instance. Another thing that needs to be changed is the way that providers are identi= fied. In a distributed setting, it is not sufficient to identify a provider based= on it's memory address, which is what DTrace currently does. This can be done through combined use of UUIDv1 and UUIDv5. - Each provider would have a corresponding UUID assigned to it. The way this would be done is starting at the endpoint. It would then advertise it's namespace-local UUID(UUIDv1 in this case) one level up. That instance wou= ld=20 then generate a namespace-local UUID for the providers that originate fro= m the instance that has just advertised it's UUID. The UUID in this case would = be a UUIDv5, combining the UUIDv1 generated in the endpoint with the name of t= he instance. The UUIDv5 generated on the node would be kept as a namespace-l= ocal UUID on each provider that originated from the endpoint. This would then further be advertised one more level up, again, generating a UUIDv5. Using this, two DDAGs would be built implicitly. This can be demonstrated on the following topology: VM{0...n}{0...m} / VM{0...n} / =20 P1 |=20 |=20 / H ----- P2 - VM{0...n} - VM{0...n}{0...m} \ | | Pk \ VM{0...n} \ VM{0...n}{0...m} where P{1}, ..., P{k} are bare-metal machines, VM{0}, ..., VM{n} top level virtual machines and VM{i}{0}, ..., VM{i}{m} nested virtual machines in the= i-th top level virtual machine. The nested virtual machines, VM{i, j} would generate their own UUIDv1 for a= ll their providers. This is guaranteed to be unique due to the fact that DTrace locks every time it creates a new provider. Following that, each of the providers from VM{i, j} would get advertised to= it's corresponding virtualization host, VM{i}. VM{i} would then generate a UUIDv= 5 for each of the providers that were advertised from VM{i, j}. The namespace name that could be used is the name of the VM. This guarantees the uniqueness of= each UUIDv5 generated on VM{i}. Furthermore, each of the VMs, VM{i} would then advertise it's providers(including the providers that were advertised from the nested VMs, VM{i, j}) to P{x}. P{x} would in the same fashion generate the UUIDv5 and finally, advertise to H, which would then have all the providers from diffe= rent machines. The difference in the case of P{x} advertising to H is that the VM name could not be used, because in this case P{x} is a bare metal machine connected through the network to H, to which H has access to. One could use= the public IP address(assuming no anycast)/hostname and/or port here. In order to be able to identify these different machines, there two UUIDs w= ould need to be stored in the dtrace_provider struct. Namely, a namespace-local = UUID generated on the host machine and the provider UUID that was generated on t= he machine that advertised the provider, so that the graph could then be trave= rsed. This would form a DDAG in the direction of tracing information flow from the perspective of VM{i, j}. That means that H would get information from VM{i,= j}, but there should be no way that VM{i, j} gets any information from H in ter= ms of data that is disclosed local to H. H could identify exactly which instance = has fired the probe.=20 Another DDAG would be formed in the opposite direction, which would be used= to instruct other instances what to do. These actions could be DTrace destruct= ive actions, asking for identification of a certain machine and similar things.= It is important that this indeed is a DDAG, as there should be no possiblity f= or this request to circle back around to the host. Additionally, in case of conflicts, UUID pocketing could be employed and si= mply store the identifying information in that form. This approach requires the restructuring of the DTrace Provider-to-Framework API. Namely, there needs to be a way to tell DTrace what instance is being registered, what instance a probe is firing in and a way to index them. Thi= s can be made backwards-compatible. Consider the following example of ensuring th= at there are no changes that need to be made in the existing providers for cor= rect operation of DTrace: dtrace_register() becomes dtrace_distributed_register(), where the former is implemented with the latter by simply passing in the instance as "host". Merits: - Allows for a concise way of storing the identifying information on the ho= st, allowing for DTrace operations such as dtrace_register(), dtrace_probe() = to operate in a similar fashion as they do now with instance-awareness inclu= ded. These operations could be implemented very efficiently. - Easily scalable to an arbitrary amount of nodes Flaws: - The instances need to be trusted. There is room for malicious operation of these instances in the proposed approach if the deployment is arbitrary. - While the existing DTrace operations can be performed efficiently, there = is an accumulation of the instructions in the operations, resulting in a lar= ger probe effect. This might prove problematic for some critical tasks and add complexity to DTrace. Possible resolution: - For virtual machines, VMI could be emplyed. This could help verify whethe= r or not the virtual machines are operating in a non-malicious manner. Many of these things are subject to change. This approach has mainly evolved =66rom the goal of tracing virtual machines with DTrace through bhyve. The = details on how the interoperability between the DTrace instances would be implement= ed have been intentionally left out, as it is not the scope of this RFC email(though I am more than willing to provide the information on the side = of virtual machines should it be needed). --=20 Best regards, Domagoj Stolfa. --mojUlQ0s9EVzWg2t Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEHQB+y96lmmv+IXofwxT+ikb0YU4FAlhmm2sACgkQwxT+ikb0 YU6nUgf/d/lABqPF+kG/dBrBYQmPQUWlfs5Pf11dbOiM9FC11t8HSPE/CKKEVLiN HK6DHSrpbGEvOI+c+Bk1aw6tU6vRVLjPWi3KLuScpFzfIUdiUOHJC5bKur2QHFLP FVOp+0Te6EqEmmVYY18PTGDKHpnSXw7f2j2wY98VK4JFXf8cBoLMg/0dSEJucCfz yxnNF5JK/Jun8yG8pNin7OZm+tGjPdU/WIpLDpL0JACwv+Zbf8nRGylOoe6Wwv06 3xLJqc3O7SpEO6XFcWOGcs7lGoSkVsUJU/FRKrcfcDX6mEalOOUB7yMyy4B3Sjui c7EIK/+WDIUqPqIzx6o3w6VByrvWdg== =6c/r -----END PGP SIGNATURE----- --mojUlQ0s9EVzWg2t--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161230173747.GB46006>