Package-manager-analysis

From UVOO Tech Wiki
Jump to navigation Jump to search

Cryptofex Package Manager

Objective

The purpose of the package manager is to provide an easy way for developers to publish and include libraries and source code into their projects.

NuGet accomplishes a similar goal for C# developers and will acts as a rough example of what and how the project should work.

Unlike package managers for FOSS, Cryptofex will also allow developers to sell licenses to source code. This is of special importance to Cryptofex because it will capture revenue from a percentage of these sales. This solves two problems in the market:

  1. Providing package management for smart contracts.
  2. Establishing a market for non-commercial sales.

Previous Work

We have determined that there are two major styles of package managers. Wikipedia refers to them as the following:

  1. System-level package managers: manage operating system dependencies, such as programs, dylibs, man pages, etc. Examples: Dpkg, Nix, Apt, Yum, Homebrew, Macports.
  2. Application-level package managers: manage packages for a particular program or programming language, such as plugins, source code etc. Examples: Atom packages, stack, yarn, npm, pip, rubygems.

Even though both styles solve a similar problem, downloading archives of files, and installing them, the challenges associated are very different. This is partially due to scale. A system-level package manager must ensure that libraries do not break any existing software. At the Application-level each application can be treated as its own sandbox. Installations in one area do not affect the rest of the system. Dependency conflicts still exist, but fewer dependencies are involved and they only need to be resolved within the project.

Other challenges are different kind. System-level managers must be very concerned with user permissions.

Package managers also tend to be closely designed around the kind of content they serve. As we shall see a method which works well for JavaScript can be impossible for C. Therefore it is very important to consider our application which is primarily Ethereum smart contracts. Ethereum is similar in design to C, so a similar structure would work well.

Nix OS

Nix is a system-level package manager for unix systems. It is of particular interest because it is a hermetic package manager. This means that package metadata totally describe its dependencies, so that new installations and removals can be guarenteed to have no affect on other system components. It represents the ideal for System-level package managers as it combats dependency hell.

Nix information:

Unfortuantly, Nix is not available on windows. Proposals for this have been sitting in Github without progress for years ( example 1, example 2. ) This report describes how to install it through cygwin, however it is a large enough task to be a project in itself. Others have installed it through WSFL, but this would require the user to install a linux distro image. Installing an additional multi-gigabyte OS is too much to ask for our application.

Porting Nix is a possibility, however it relies on many unix specific features, such as symlinks, having a default compiler, etc.

Yarn/NPM

Yarn and NPM are application-level package managers for JavaScript. Like Nix, they are hermetic, but solve the hard problem by avoiding them entirely. Each module creates an entire copy of its own dependency tree, and within that module, imports only search its dependency subtree. No attempt is made to share dependencies, but the modules usually consist of a few text files. One advantage of this approach, is that dependencies can easily use different versions of the same library without conflict.

Recall the diamond problem, in which two dependencies rely on different version of the same package, such as:

 / B \
      D (1.0)
A
      D (1.5)
 \ C /

The following directory structure would be created:

A/
  a.js
  B/
    b.js
    D/ (contains 1.0)
  C/
    c.js
    D/ (contains 1.5)

B and C would import their own separate copy of D. Note that this is also a runtime feature of JavaScript. In C++ this would not work, because including two copies of D into the same process would result in symbol conflicts. Even if there were no diamond problem, the NPM model would create multiple copies, making this unusable for C and C++.

Yarn and NPM also use semantic versioning (major.minor.patch) to allow packages to automatically receive minor updates to their dependencies. For example, I might define a dependeny json:1.0.^ which means accept any patch with major version 1 and minor 0. Early on, this was a huge problem for NPM, because installing dependencies could result in a different outcome each time it was run. Yarn solved this problem by creating a package-lock file which is a list of the full version identifiers of the entire dependency tree. Users must then explicity update this file (although updates are sill constrained by the semantic version). (This has since been adopted on NPM.)

Atom, a similar Electron based IDE uses NPM. However, Atom packages are mostly JavaScript plugins for their IDE.

One concern is that Yarn and NPM require CouchDB on the server to manage the registry. Choosing Yarn/NPM would therefore dictate our client and our server solution, make it more difficult to change in the future. We want to build a robust backend system, with the possibility that the client eventually changes.

Solidity Language Imports

Yarn and NPM are not compatible with Solidity because of the symbol conflict idea discussed.

In solidity contracts are included with an import "file.sol" statement. This dumps all the symbols into the global namespace. You can also use a safer namespaced inclusion: import file.sol as file. However, file becomes part of the global namespace, eventually all contracts are included into a "main" contract which is the root of the tree. Here is an example we setup:

rhoc.sol

import "./erc20.sol";
import "./math2/math.sol" as math;

erc20.sol

import "./math1/math.sol" as math;

This causes a conflict, because there are two versions of math in the global namespace. A consequence of this is that its not feasible for a dependency to use a dependency of a different version, because the symbols collide. This is further discouraged becuase all solidity files must be marked with compatible pragma compiler versions in order to be compiled.

Renaming

The idea of changing the import name to avoid collisions was brought up. In the previous example rhoc.sol could import .. as math2 and avoid the math collision. This works well for user code, but in the case of something like the diamond problem the renaming would also have to occur in the libraries.

 / B \
      D (1.0)
A
      D (1.5)
 \ C /

In B, rename import ... as D to as D1_0. In C, rename import ... as D to as D1_5.

This is certainly a possibility but as the example shows, it potentially requires modifying the code of all the libraries, not just user code. (Is this a good idea?) It becomes a mathematical problem of assigning each a unique prefix to the import statements of each node in the tree, for example using 0 for left or 1 for right.

The Unix Linker

C and C++ face the same issue and resolve it with a similar renaming scheme. It is possible to use macros and append a prefix to symbols in the binary. The linker even provides a tool for doing this, but apparently it isn't well supported on multiple operating systems.

This is a fundamental a limitation of how C and C++ work. Even nix can't solve the diamond problem for a single process. This article describes it well:

There can’t be two versions in use of the same function with the same name, even if they are in different libraries (well, it won’t work the way you want it to).

Summary

Yarn and NPM work well for JavaScript, because it is perfectly normal for different dependencies to use differing versions of other dependencies. The runtime system encourages this. However, allowing this in our package manager would actually break code like C and C++, such as solidity. Solidity code.

Therefore, we should aim to resolve 1 version of each dependency per project.

vcpkg

This discussion leads us to a model more like traditional package manager such as pip or stack. When a package is installed, the best version of each dependency will be kept in a flat directory. All other packages and the project itself will refer to this version. The best being the one that will work for all dependencies.

We propose using vcpkg, a tool developed by Microsoft managing C and C++ dependencies. Vcpkg is an application-level package manager. vcpkg is based on cmake. Even though it is designed for C and C++, it can be easily adapted for distributing other kinds of files. vcpkg is also supports windows, mac, and linux.

Packages

vcpkg is designed to import existing libraries without requiring any modifications to their repos, such as including a package info file. Instead a ports/ directory is included in vcpkg which contains download and installation information for each package. This will allow us to support any existing smart contracts on Github or Gitlab, without convincing the authors to use our system. We only need to write a port file for it.

Example

Control File

Source: eth-dutch-auction
Version: 1.0.0
Description: A dutch auction ethereum contract.
Depends: ds-test

Install File

include(vcpkg_common_functions)

vcpkg_from_github(
    OUT_SOURCE_PATH SOURCE_PATH
    REPO justinmeiners/dutch-auction
    REF 4ffb4779ea9d27c2572e5c7ad2443dc088f20dbd
    SHA512 36f54627c7be36d815cd2b89c7228e309da516357a9bdcff2035f35e5a3767b469a19b10f099c61c6154ab3bdf773fd20cce57131c26feb2d059d7056d28c088
    HEAD_REF master
)

file(INSTALL ${SOURCE_PATH}/dutch_auction.sol DESTINATION ${CURRENT_PACKAGES_DIR}/include)
file(INSTALL ${SOURCE_PATH}/LICENSE DESTINATION ${CURRENT_PACKAGES_DIR}/share/eth-dutch-auction RENAME copyright)

This format supports many operations, such as patching files, unarchiving directories, etc.

Registry

We do not need the C/C++ ports/ included in vcpkg. Instead, we will replace them with our own for smart contracts. This will be done by creating a publicly accessible Gitlab project containing the ports. Any Gitlab user will be able to add their own contract ports, and create a pull request.

Cryptofex will download from this repository when it needs to update the port list. (similar to apt get update).

We will provide an interface in the IDE or on a website that allows users to browse packages in the registry.

Triplets

Triplets are used to define target platforms in vcpkg. The provide build configurations for cmake, including target architecture, etc. The defaults are things like x86-linux. For smart contracts we do not need to build native dependencies and will instead define our own triplets, such as ethereum-metropolis and rchain-mercury. This also provides a namespace for contracts in the filesystem.

If we do need native dependencies they can target the standard triplets definitions.

Cryptofex Projects

Inside Cryptofex, users will list their dependencies in a project file. To begin installation Cryptofex will create a vcpkg instance in their project with the following structure:

vcpkg/
    scripts/
    triplets/
        ethereum..
        rchain...
    ports/
    installed/
        ethereum/
            include/
            share/
    .vcpkg-root

Smart contracts will be installed in the include directory, and documentation will be added to share. The complier will remap empty imports to the include directory so that in solidity code it can be as easy as import "file.sol"

The vcpkg binary will live inside cryptofex and will be sent commands to install each dependency in this directory.

Monetized Packages

Packages which cost money to license cannot be distributed through public repositories. Cryptofex will provide a service to host these packages and manage their licenses. However, their port metadata can still be included in the registry. Access, to the download will be behind a paywall, but the metadata will be available.

Alongside the vcpkg_from_github we could introduce vpkcg_from_cryptofex to provide a convenient way to create ports from our service.

When vpkg downloads from the Cryptofex store, some authentication key or token will be included. The server will then check this account against the user's account to ensure they have a license for the package.

The purchase of license will take place somewhere else. We like the idea of using the Ethereuem network to handle transactions and to actually keep license records on the blockchain. However, this will require further work to define. Regardless of the way transactions are handled, the server will have a way to check licenses and approve or reject downloads.

Outside Cryptofex

Packaging smart contracts is a problem which all DAPP developers are interested in regardless of their build process or platform. If we make sure our package tools work well, independently of Cryptofex, then we can capture revenue from this market. Anyone who wants to purchase a contract from our service should be able to.

Version Management

Versions are currently locked to the state of the port directory. Each port references a specific commit from an original repository. The vcpkg offical ports only care about the most recent versions of software. Users can lock their versions by saving their ports directory.

This simple file system approach gives us a lot of flexibility for how we approach this problem. We can start with the simple approach of vcpkg and create more feature rich version management as needed.

~~Major versions could be handled by including multiple port files. For example python2 and python3 or sqlite3 and sqlite2. Minor versions could be handled a similar manner as vcpkg. User's would be reponsible for tracking down port revisions to get what they want. We could also store many port versions in a database and generate a port directory as needed.~~

The idea behind this simple approach, is that the ports tree captures a snapshot of dependencies which are guaranteed to work with each other. If you were to select a version of A and a version of B you risk running into the diamond problem.

The following is a comment from a vcpkg developer on how multiple versions of the same dependency should be resolved.

We envision this as being handled by three different vcpkg instances. For example, Project A has a local vcpkg\ subdirectory with the commit for X@1.2, Y@2.5, Z@1.0. Since the entire system (including all supporting machinery such as the tool and helper scripts) is versioned together, you get guaranteed reproducibility across machines as long as you check out that same commit for vcpkg.

Of course, this only makes sense when A and B aren't going to be loaded into the same process -- otherwise the diamond dependency on X at different versions will cause undefined behavior for at least one of them! Preventing this diamond clash is why a single vcpkg instance carries a single version for a particular dependency; as long as everyone who wants to share a process shares the same vcpkg instance, you're guaranteed to get consistent versions.

Milestones

Packages will be implemented in several milestones or phases.

  1. Include packages based on open source projects.
    • vcpkg integeration in Cryptofex.
    • easily add/remove packages from project.
    • Simple port registry hosted in a public repository, such as Gitlab.
    • This phase will focus on existing open source projects.
    • focus on adoption in the community. Medium article?
  2. Online store.
    • server will host package information and provide ports.
    • interface for searching for packages
    • some packages hosted on Cryptofex servesr.
  3. Monetization
    • users will be able to create accounts and hold tokens
    • users can upload private packages and sell them for tokens
    • users can purchase licenses to private packages

Limitations

The following plan is designed to achieve the goal of distributing smart contracts which are similar to the current Ethereum and Rchain ecosystem and transitively resemble C. Each project has its own dependency tree which doesn't need to be compatible with any other. This limited way of handling packages can work well for independent programming projects.

Distributing software as a general problem would be better served by something like Nix. In this instance many different applications need to coexist, having their own versions of some dependencies and sharing others.

An Upgrade Path

How big is Nix?

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            106           7450           3136          29694
C/C++ Header                    72           4980           6256          14864
yacc                             1            106             21            555
m4                               2             90              0            298
...
(some hidden)
...
-------------------------------------------------------------------------------
SUM:                           649          19232          10723          68663
-------------------------------------------------------------------------------

Suppose nix was ported to windows. Could we easily convert over our existing system?

  • Nixpkg's are similar to ports, in that they reference git repositories and commits externally. It is even feasible to write a script to convert one format to the other.
  • The licensing server system could remain unchanged. It would still hold respositories behind a paywall that require authentication to download. Nix could just as eails
  • The Cryptofex project files could probably also remain unchanged. The vcpkg folder would be deleted from the project. Versions might not match up and may need to be changed.