Abstract :
As the volumes of AI problems involving human knowledge are likely to soar, crowdsourcing has become essential in a wide range of world-wide-web applications. One of the biggest challenges of crowdsourcing is aggregating the answers collected from crowd workers; and thus, many aggregate techniques have been proposed. However, for a particular application, it is difficult for users to choose the best-suited technique as well as appropriate values of parameter configuration since each of these techniques has distinct performance characteristics depending on various factors (e.g. worker expertise, question difficulty). In this paper, we develop a benchmarking tool that allows to (i) simulates the crowd and (ii) evaluates aggregate techniques in different aspects (accuracy, robustness to spammers, etc.). We believe that this tool will be able to serve as a practical guideline for both researchers and software developers. While researchers can use our tool to assess existing or new algorithms, developers can reuse its components to reduce the development complexity.