Sphere Engine - official handbook

Welcome

This document is the guide for users of our Online Judge based services including:

We focus on the essence of functionality namely the idea of processing source codes submitted into the system. As the user of one of our products you will be the creator of new programming problems.

Let us introduce the convention of naming - the person who creates the problem is the problem setter or the author and the person who use the service and solves the problem is just the user. All presented information are independent of the type of our service thus we just call it service. Also note that the service is mostly oriented to deal with console applications with standard input / output communication.

We provide simple definitions in dictionary, get familiar with them to avoid future problems with understanding.

Dictionary

General

(Computational) problem
A question that computer is able to solve. Usually the problem includes variables. For example, for a given number n find its any nontrivial factor. The value n is a variable in the problem.
Problem instance (input instance)
The set of data which determines all variables from the computational problem definition. For example number 10 is an instance for the prime number problem (i.e. for a given number n determine if n is a prime number).
Binary problem
Any computational problem with binary (i.e. true or false) answer. For example the prime number is a binary problem.

Service

Problem

The challange task for service’s users which is the basic element of the service and it contains:

  • description text explanation of the problem nature along with input / output specification (see more Problem description)
  • test cases (see more below)
  • master judge (see more below)
Submission
The source code which is potentially the correct solution to the problem.
Test case

The set of data (usually pair of input file and correct output file) used for verification of the user’s submission (see more Test case). It serves as a certificate of the correctness and also contains:

  • time limit the maximum time of the user’s solution execution with the data from input file
  • tast case judge a program that analyses the output generated by user’s solution (usually compares user’s output with correct output) (see more Judges, Writing judges)
Master judge
The program which gathers results from test cases to produce final result of the submission (see more Master judges, Writing master judges).
Submission status
Short information about the user’s submission execution. It can be one of the following: accepted, wrong answer, time limit exceeded, run time error, compile error (see more Statuses).
User
The programmer (developer).
Problem setter
User of the service with the permissions to create and manage problems.

Introduction

To sketch the idea of automatic program correctness verification as simple as possible we first consider classical (manual) methods and after that we present automatic methods. We are going to point differences between these two approaches and indicate advantages of the automatic method.

Manual verification

The problem setter needs to try the code manually i.e. he needs to try it for some simple input data. In that situation even simple programs require at least small human readable interface. Let us consider the following toy example along with the C language solution:

The integer power problem description
Write a program which takes two numbers a and b and returns the value of ab.
int atob(int a, int b)
{
  int i, result = 1;
  for(i=0; i < b; i++)
  {
    result *= a;
  }
  return result;
}

int main()
{
  int a,b;
  printf("Give a: ");
  scanf("%d", &a);
  printf("Give b: ");
  scanf("%d", &b);
  printf("a to the power of b = %d", atob(a,b));
  return 0;
}

After compiling and running that piece of code the problem setter is ready to use the program.

Nevertheless, it is easily seen that manual verification requires autor’s continuous preparation - for each input the autor must know the correct output. That’s the rule for all problems he manages - the author is forced to remember how to verify the corectness of all his problems.

Another issue is that he probably wants to be reliable thus he needs to handle with all boundary, clever and problematic input instances. Some interesting situations can be pointed even for that simple problem above. For example, does a to the power of zero give 1?

Another question is how to deal with time and memory complexity issues? Academic experience of members of our team yields that the automatic judge solutions save plenty of time and greatly improve the quality of verification.

Automatic verification

Moving on to the online judging we will present you how we need to change approach. Now whole the process of the compiling, running and testing is going to be conducted by the machine thus the problem setter needs to clearly specify how the program should communicate - that part we call input / output specification.

Suppose we want to prepare the Integer power problem to automatic judging and suppose the presented source code fulfills desired input/output specification (usually the direction is opposite - first we specify the input and output structure then we implement the solution). It means we need, along with the problem definition, give the information that program should expect two integer numbers on the input and needs to print the information “Print a: Print b: a to the power of b = ab”. Clearly the output specification is highly unnatural.

As you can see we don’t need to have any human readable interface at all. In fact, it is rather obstruction which can even lead to reject correct solutions due to some complicated or nonintuitive input or output requirements.

How simple can the previous Integer power problem could be if we drop all the input / output formating? We don’t want to have any “Give me the value” statements to obtain the input and “Result is” statements in the output thus it would be enough to stay with minimal source code:

int atob(int a, int b)
{
  int i, result = 1;
  for(i=0; i < b; i++)
  {
    result *= a;
  }
  return result;
}

int main()
{
  int a,b;
  scanf("%d %d", &a, &b);
  printf("%d", atob(a,b));
  return 0;
}

Tip

Base on our own experience with online judge beginners we know that it’s the most dangerous when you deal with developers who already have basic knowledge along with some habits. We highly recommend you to put emphasis on input / output specification when you will present your instance of the service to your own users.

We present extended description of input and output specification in the next section.

Finally, let us take a look at the simplified submission flow diagram:

_images/simple_diagram.png

In this diagram we can see how user’s source code go through the system. Firstly it is compiled by a proper compiler (according to the selected programming language) which produces executable file[1]. In the next step created executable is fed with model input data and the system obtains the user’s output file. The final step is the procedure of comparing the user’s output file with the model output file, we call it judging. The judge returns the status of the submission. Note that the diagram is a simplified sketch of the idea and we will extend both the picture and the explanation.

Footnotes

[1]In case of script languages we use proper interpreter to run the code.

Problems

In this section we are going to give you insight into submission flow and online judge problem construction. From the previous section we know basics of the concept of the online judging.

To better understand and make it possible to create own problems we will describe components of the problem:

  • Description
    • The task
    • Input / output specification
    • Input / output examples
  • Test case(s) (possibly many)
    • Input file,
    • Output file
    • Time limit
    • Judge
  • Master judge

Before we begin the analysys of the individual components let us take a look at the complete diagram of submission flow to embed the listed components in overall view.

Full submission flow diagram

The diagram is the extension of the one from the previous section. The main difference is the appearance of the master judge component. Since we have introduced possibility of many test cases (input, output and judge) thus we need the summary component to combine the results from test case judges - we call that summary component master judge.

Every submission needs to have precised programming language to choose proper compiler or interpreter.

The problem description is the crucial part for users to make it possible to solve the problem.

Let us recall the Integer Power problem since we are going to use it as a demonstration example. The task described in the problem is to compute the value of ab for given integers a and b

Description

User must be able to understand the problem to solve it successfully. Description contains complete information needed to solve the problem thus it is the most important feature from the user’s point of view.

The task

The formal text that describes the specificity of the problem usually in mathematical manner.

Tips

To achieve good quality of the description it is a good habit to:
  • graphically illustrate the problem,
  • support a dry theory with simple examples.

Referring to the Integer Power problem we could write the taks as follows:

Example (part 1)

Write the program which takes two numbers a and b and returns the value of ab. For example for a = 3 and b = 4 the result is 34= 81

Input / output specification

The previous part was formal from the mathematical perspective but not syntactically. Automatic judging is only possible when we can make an assumption on the problems’s input and output behaviour. The interior of the user’s program is a black box for us right now. We specify what is the input file structure to make it possible for users to implement proper input reading.

Tip

The specification of input file should be an information that user can rely on.

On the other hand we expect that user produces output file with a predictable structure. Formal specification of the output file is an information that we are going to rely on to verify the correctness of the submission.

Referring to the Integer Power problem let us recall first solution of the problem. We didn’t specify any formal input or output syntax in the previous section. First attempt was the solution with the human readable interface and after that we modified the code to achieve a raw version. We can use that source code as a base for following formal specification:

Example (part 2)

Input
In the only line of the input there will be two integer numbers 1a8 and 0b10 separated by a single space character.
Output
Program should write a single number which is a value of ab.

Important

Typically we create the input / output specification before or independently of implementation.

Input / output examples

Referring to the Integer Power problem we present how we could compose the examples:

Example (part 3)

Example 1
Input
3 4
Output
81
Example 2
Input
7 0
Output
1

Test case(s)

Just as the description was for users only so the test cases are for the machine checker. This is the essence of the automatic judging idea. The vast majority of the usages implements the following schema: there is a model input paired to a model output along with the program which can compare that model output with the user’s output to decide whether user’s answer is good or not.

Input and output files

Input file contains the problem instance and it must be consistent with the input specification. The output file should contain corresponding correct answers formatted in accordance to the output specification.

Note

It is not necessery to write the solution to the program to create the model output file - it can be obtained in any manner, however, it is a good practice to have model solution.

Referring to the Integer Power problem we present how we could prepare following test cases:

Example (part 4)

Test case 1
Input file
3 4
Output file
81
Test case 2
Input file
7 0
Output file
1
Test case 3
Input file
2 10
Output file
1024

Tip

It is recomended to construct the problems that are able to repeat the desired procedure as many times as we want to make it possible to test the user’s submission with one test case. There are many reasons for that approach and for further information please visit good test cases design appendix.

Time limit

We have already pointed that one of the features of online judging is the possiblity of estimating the time complexity. To achieve that the author of the problem has to adjust the time limit for program execution. We cover this topic in testing time complexity appendix.

Example (part 5)

Our toy example problem is much too simple in assumptions to allow us to present example of time limits that distinguish different algorithms thus we put default time limit of 1s. In the next section we present more complex example where we further discuss the time limit which can help to estimate the algorithm quality.

Judge

The judge is a program which process user’s output file after execution. Its task is to establish if the submission passed the test case and potentially also returns the score. When the user’s program pass the test case the returned status is “accepted”.

Usually the judge implementation is reduced to compare the model output file with the user’s output file. We support problem setters with default judges:

  • Ignoring extra whitespaces - compares output files up to extra whitespaces
  • Ignoring floating point errors up to a specific position - allows the floating point numbers to be inaccurate i.e. we can accept the errors up to for example 0.01
  • Exact judge - requires output files to be identical

More information about default judges you can find in the section judges. You find there also information about the score.

Tip

The Ignoring extra whitespaces judge is one of the most popular default choice. It is more liberal for output formating errors which in fact doesn’t affect on the solution semantic correctness.

It is possible to create custom test case judges. The author can implement any kind of verification having full access to the input file, base input file, user’s output file and even user’s source code. For more information visit the section writing test case judges.

Example (part 6)

For the Integer Power problem we decide to use default Ignoring extra whitespaces judge for each test case thus we allow the user to generate extra whitespaces before and after the resulting number ab.

For example when user’s solution prints “   81    ” as a result for “3 4” problem instance it is still correct answer.

Master judge

We have discussed the individual test cases for the problem and established that each of them returns information, i.e. status and the score. The master judge is the component which combines all incoming results obtained from test cases to produce the final result which is the status and the score. You can look again at the submission flow diagram for better understanding.

There are predefined master judges proper for most situations:

  • Generic masterjudge - it gathers information from test case judges and requires each of them to achieve “accepted” as the result to establish final result as the “accepted”.
  • Score is % of correctly solved sets - it is a more liberal masterjudge which allows to accept incomplete solution with the score which is the percentage of correctly solved test cases.

You can learn more in Master judges.

Hint

When you need to use more complex master judge it is possible to create the new one or modify the existing ones. You have access to the source code of default master judges and they can be used as a base for your modifications.

Further information about designing master judges you can find in the section writing master judges.

Example (part 7)

The last missing part for the example we successively improve is the choice of master judge. We created two test cases and there is no need to implement the specific own master judge thus we select default one.

When we need to distinguish the solutions as better or worst (but both correct) we should rather choose Score is % of correctly solved sets but in our situation each test case is a pure verification of correctness (i.e. no performance aspects tested) thus we select Generic masterjudge to force the user’s solution to pass all test cases.

Complete example

We have discussed all components of the problem specification therefore we are able to present whole problem setting:

Complete The Integer Power Example

Title: The Integer Power

Description
The task
Write the program which takes two numbers a and b and returns the value of ab. For example for a = 3 and b = 4 the result is 34= 81
Input / output specification
Input
In the only line of the input there will be two integer numbers 1 |le| a |le| 8 and 0 |le| b |le| 10 separated by a single space character.
Output
Program should write a single number which is a value of ab.
Examples
Example 1
Input
3 4
Output
81
Example 2
Input
7 0
Output
1
Test cases
Test case 1
Input file
3 4
Output file
81

Judge: Ignoring extra whitespaces
Time limit: 1s
Test case 2
Input file
7 0
Output file
1

Judge: Ignoring extra whitespaces
Time limit: 1s

Master judge: Generic master judge

Problem example

In this section we present more complicated example with full details to give you better overall look at abilities of our services. It is still an elementary example but it will tell you much more about the specific of online judging.

Important

In the following example we created test cases according to the advice in the good test cases design appendix.

The problem is to count the sum of numbers from 1 to given n i.e. 1 + 2 + 3 + ... + n, we call it the Initital sum problem. We are going to preper it to handle with multiple input instances in a single test case by proper input / output specification design. Look at the following problem description:

Example - The Initial Sum

For a positive integer n calculate the value of the sum of all positive integers that are not greater than n i.e. 1 + 2 + 3 + ... + n. For example when n = 5 then the correct answer is 15.

Input
In the first line there will be the number 1t10000000 which is the number of instances for your problem. In each of the next t lines there will be one number n for which you should calculate the described initial sum.
Output
For each n print the calculated initial sum. Separate answers with new line character.
Example 1
Input
4
1
2
3
4
Output
1
3
6
10
Example 2
Input
2
10
11
Output
55
66

Note

Input specification allows us to construct rich test cases which are able to distinguis between faster and slower solutions.

Referring to the solution to the Initial sum problem take into account that the possible input data can be too big for the standard int type thus we will use the long long type. Before we set test cases let us present two distinct solutions to the problem:

long long initsum(long long n)
{
  int i;
  long long sum = 0;
  for (i=1; i <= n; i++)
  {
    sum += i;
  }
  return sum;
}

int main()
{
  int t;
  long long n;
  scanf("%d", &t);
  while (t > 0)
  {
    scanf("%lld", &n);
    printf("%lld\n", initsum(n));
    t--;
  }
  return 0;
}

Complexity note

The first solution refers directly to the definition of the problem i.e. the function initsum iterates from 1 to n to calculate desired value. The calculation requires n operations of addition to obtain the result.

It is basic school knowledge that there exists the compact formula for that problem and we use it in the second implementation:

long long initsum(long long n)
{
  return n*(n+1)/2;
}

int main()
{
  int t;
  long long n;
  scanf("%d", &t);
  while (t > 0)
  {
    scanf("%lld", &n);
    printf("%lld\n", initsum(n));
    t--;
  }
  return 0;
}

Both programs are correct answers to the problem but if we want to distinguish the algorithms we can design test cases that only the second solution can pass.

Note

It highly depends on the computational power of the machine that runs test cases. We present test cases that are valid for the computer of this text’s author.

Our suggestion is to design one test case which is easy to pass for both algorithms to give information that the solution is correct and the second test case that is possible to pass only for the second algorithm. It can give an information to the user, that his solution is correct but too slow.

Tip

It is a good practice to design first test case to be the same as input / output example from the problem description. In this way, the user can verify his input and output management.

The user submitting solution similar to the first one will get information about test cases and will be able to see that his program passes first test case and exceed time limit in the second test case.

We cannot put all input and output data here because of its size thus we write it in shortened manner:

Example

Test case 1
Input file
1000
1
2
...
1000
1000000
Output file
1
3
...
500500
500000500000
Test case 2
Input file
999000
1001
1002
...
1000000
Output file
501501
502503
...
500000500000

Computational power of current machines is enough to finish first test case instantly. Both presented algorithms finished computations with time below 0.01s. However it is a good test case for a corectness verification only.

First 1000 positive integers give us the assurance that solution is mathematically correct. We have also added single test with big number i.e. n = 1000000 to make sure that user’s solution bases on long long type. On the other hand the second test case is rich enough to make the first algorithm to exceed even 5s time limit.

The second algorithm works fast enough to pass that test case in time below 0.1s. There is a huge gap between 0.1s and 5s thus we can easily choose safe value as our time limit, for example 1s.

Note

For extended information about distinguishing the efficiency of algorithms see testing time complexity appendix.

We still haven’t chosen judges for test cases and master judge for the problem. We don’t have floating point numbers in our output file specification thus we rather decide to choose Ignoring extra whitespaces judge for both test cases. It leaves users with possiblity of small formating errors without risk of unwanted rejections of theirs solutions. For example it is possible to replace new line characters with spaces in output formatting and still pass the test case.

We assume that we want to accept every correct solution but distinguish the better ones and give them a better score. The Score is % of correctly solved sets master judge is perfect for that purpose. Submitting the first solution achieves the result of 50% while the second solution passes both test cases and its result is 100%.

Note

Presented scoring method assumed both tests as equally worth 50% each. To achieve different distribution of scores you need to modify the master judge and pick the scoring of test cases arbitrary. We present the example in the section writing master judges.

To sum up we present full problem specification:

Complete The Initial Sum Example

Title: The Initial Sum

Description
The task
For a positive integer n calculate the value of the sum of all positive integers that are not greater than n i.e. 1 + 2 + 3 + ... + n. For example when n = 5 then the correct answer is 15.
Input / output specification
Input
In the first line there will be the number 1t10000000 which is the number of instances for your problem. In each of the next t lines there will be one number n for which you should calculate the described initial sum.
Output
For each n print the calculated initial sum. Separate answers with new line character.
Examples
Example 1
Input
4
1
2
3
4
Output
1
3
6
10
Example 2
Input
2
10
11
Output
55
66
Test cases
Test case 1
Input file
1000
1
2
...
1000
1000000
Output file
1
3
...
500500
500000500000

Judge: Ignoring extra whitespaces
Time limit: 1s
Test case 2
Input file
999000
1001
1002
...
1000000
Output file
501501
502503
...
500000500000

Judge: Ignoring extra whitespaces
Time limit: 1s

Master judge: Score is % of correctly solved sets

Judges

We have mentioned that the problem setter can choose from default test cases judges and master judges. In practice it is usually enough.

Test case judges

Each judge returns piece of information about execution of the submission:
  • execution time
  • used memory
  • status (look statuses)
  • score (optional)

Important

Execution time and used memory are obtained automaticaly from system. The score depends on judge design and can take any value.

There are six default test case judges:

Ignoring extra whitespaces (id = 1)

It compares output files up to extra whitespaces. It is the standard judge for all problems with exactly one correct solution (in particular for binary problems).

Example

The prime number problem (i.e. for a number n determine if n is a prime number and then return 1 as a result, otherwise result is 0).

Ignoring floating point errors up to 10^-2 (id = 2)

It allows the floating point numbers to be inaccurate i.e. we can accept the errors up to 0.01. It is good choice for problems when the result is an approximation which don’t require good precision.

Example

The problem of the triangle area, i.e. for given integer side lenghts a, b, c calculate the area of the triangle. In general the result is not an integer and accuracy is not crucial.

Ignoring floating point errors up to 10^-6 (id = 3)

it allows the floating point numbers to be inaccurate i.e. we can accept the errors up to 0.000001. When you need a good precision it is obviously better choice than the previous judge.

Example

The problem of the value of the sine function. The range of the sine function is [-1,1] thus the precision is important.

Score is source length (id = 4)

it is the modification of the Ignoring extra whitespaces judge i.e. it verifies the correctness of the solution in the same manner but in addition it returns the score which is the length of the source code. The judge is designed for challange type problems where the objective is to solve the problem with the shortest source code.

Example

The problem of determining first n numbers in decimal expansion of the number π for given n. The challange is to solve that problem with the shortest possible source code.

Academy (id = 9)
It works exactly the same as Ignoring extra whitespaces judge but in addition it gives the access to difference between model output file and user’s output. Read submission information appendix to learn more about the place where the difference between files is stored.
Exact judge (id = 10)
It requires output files to be identical.

Master judges

Each master judge returns piece of information about the submission:
  • execution time
  • used memory
  • status (look statuses)
  • score (optional)

The master judge can arbitrarily set each part of that information based on the information received from test case judges (look diagram). In our default master judges we assumed following natural convention for that values:

Execution time
The overall execution time is the sum of times from test case judges.
Used memory
The overall used memory is the maximum value of used memory from test case judges.

Important

The final status and the score can be freely combined based on statuses and scores from test case judges.

We have two default master judges both mentioned in section problems:

Generic masterjudge (id = 1000)

It gathers information from test case judges and requires each of them to achieve “accepted” status to establish final status as the accepted.

Example accepted result:

_images/status-generic.png

When any test case ends with error the final answer is inherited from the first failed test case. For example when the problem has five test cases and the second and the fourth ones failed, the final result is inherited from the second test case.

Example time limit exceeded and wrong answer results:

_images/status-tle.png

_images/status-wa.png

Generic masterjudge combines the execution times of all testcases and yields the sum as the final score.

Tip

It is a proper choice when the problem setter requires that the solution fulfills all his requirements i.e. it is correct and sufficiently efficient.

Score is % of correctly solved sets (id = 1001)

It is a more liberal masterjudge which allows to accept incomplete solution with the score which is the percentage of correctly solved test cases.

Example result:

_images/status-percentage.png

For example when the problem has five test cases and only the second failed, the final score is equal to 80%. The advantage is that the user gets more information about the correctness level of its solution.

Tip

It is a proper choice when the problem setter wants to distinguish user’s solutions. It is possible to design test cases to be easier or more difficult to pass.

Example

The problem of power function i.e. for (possibly big) integer numbers a and b calculate the value of ab.

  • The first test case can deliver only input instances for which the result is in the standard numeric type scope.
  • Another test case can require from the solution to implement the big numbers model.

These two test cases give an information on the advancement of the solution.

  • The third test case could also take into account the aspect of the performance and distinguish solutions implementing naive algorithms from the better ones which implement the fast power algorithm.

The least advanced (but in some way correct) solutions will pass the first test case and achieve the result of 33% while the more complex solutions (implementing big numbers) are able to pass the first and the second test and achieve the result of 66%. To achieve the best result of 100% the solution needs to implement both big numbers and fast power algorithms to pass all three test cases.

Enabling master judge

The crucial part of setting the master judge is to decide how it should interpret the score. There are three options the author can choose:

  • binary (bin)
  • maximum (max)
  • minimum (min)

Setting any of these options determines what is the final submission’s score and which score is better.

Binary
The final success result is simply accepted status and the score is the execution time. The faster submissions are better by default. The Generic masterjudge usually works with that option.
Maximum
The final score comes from test case judges and we establish that the greater values are the better ones. It is the proper choice for the Score is % of correctly solved sets masterjudge.
Minimum
The final score comes from test case judges and we establish that the smaller values are the better ones. For example usage of Score is source length test case judge yields the score intended to be minimised.

Writing judges

In the previous section we have discussed default test case judges and master judges which are sufficient for most situations. However there are problems which require individual solutions due to nature of the problem. In this section we present examples of the problems along with descriptions and motivation.

Writing test case judges

Test case judge has access to the following information:
  • model input
  • model output
  • user’s output
  • user’s source code

Impossible model output file

Task example

For given function f find its root i.e. the argument x0that f ( x0) = 0.

In general there are many solutions to the problem, for example for polynomial x2+ x - 2 the numbers 1 and -2 are both correct answers.

You can see that it is hard to prepare model output file in test case. There are possibly infinitely many solutions for the certain functions thus it is impossible to keep all of them in the output file. It forces us to use different approach.

Judge description

The test case judge should verify the condition from the problem task i.e. for the user’s answer from the output file it should check if that answer is a root of the function.

Test case judge uses his access to model input file to read the problem instance.

Ambiguous model output file

Task example

For given graph G with n vertices 1, 2, ..., n determine if it has hamiltonian cycle (i.e. closed loop through a graph that visits each node exactly once). If the hamiltonian cycle exists print it as a sequence of vertices.

It is easy to see that 1-2-3-1 is the same cycle as 2-3-1-2. We could add the requirement to start with the smallest vertex number. Unfortunately it is possible that there exists many different hamiltonian cycles which are not cyclic shifts. We could again use the previous approach and verify if user’s answer is really hamiltonian cycle. Alternatively we can build model output file with all possible hamiltonian cycles:

Judge description

For user’s answer the judge looks for that specific one on the list contained in model output file.

Caution

It can be problematic to keep all answers due to possible huge number of good solutions.

Writing master judges

Similarly to test case judges it is possible to create custom master judges. In certain situations the problem setter may want to extend functionality of existing master judge or even implement brand new one. In this section we present examples of the master judges along with a motivation.

Master judge has access to the following information:
  • results from test case judges
  • user’s source code

Weighted % of correctly solved sets

Here we present the generalisation of Score is % of correctly solved sets master judge. It was a little disadventage that each test case is worth the same and to increase to influence of some submission’s aspect you were forced to produce many test cases.

For example when your test cases verify three aspects A, B and C of the problem and you would like to put weights 20%, 30% and 50% respectively, you were able to do that by creating 10 test cases. Two of them responsible for an aspect A, three of them responsible for an aspect B and five of them responsible for an aspect C. However it is inconvenient and you can consider following idea:

Master judge description

Master judge has the information about the number of test cases and weights which it should assign to each test case. The final score is the weighted sum of accepted test cases.

For example for three test cases a,b,c and weights 20%, 30%, 50% the submission gets one of the possible results depending on passed test cases:

  • no test case passed - 0%
  • a - 20%
  • b - 30%
  • a,b - 50%
  • c - 50%
  • a,c - 70%
  • b,c - 80%
  • a,b,c - 100%

Forbidden structures in source code

The problem setter may require that the solution cannot use some programming structures. For example he may want to allow to use language C++ but with no access to STL library to force users to implement efficent data structures manually. Another example is to restrict source codes to not use loop structures to support only solutions based on recursion.

Master judge description

Master judge uses access to the user’s source code to detect usages of forbidden keywords (for example loops: while, for, goto). When forbidden keyword is detected the final status is set to wrong aswer in other case the master judge performs classical verification (for example the same as Generic masterjudge).

Used memory limitations

We cannot directly support memory limit due to the reasons explained in testing the memory complexity of algorithms appendix. To make possible to bound the amount of available memory one can implement master judge for that purpose.

Master judge description

Master judge gathers the information of used memory from test case judges and takes the maximum value as the result (note that this is the behaviour of default master judges). We verify the memory limit with respect to the user’s solution programming language to adjust the master judge for all programming languages we allow to use.

Important

It’s very imporant to adjust the memory limit to the programming language due to different memory needs of programs in different languages.

Appendix

Good test cases design

It is common misunderstanding which leads to bad test cases design. The number of test cases assigned to the problem is limited. The individual test case is not intended to test only one problem instance.

Tip

We recommend you to redesign your input / output specification to handle with multiple problem instances in one test case.

Consider the following elementary problem as an example:

Task example

For given integer numbers a and b calculate the sum a + b.

We could design the input / output specification to calculate the sum only for two numbers:

Poor input / output design example

Input
In the only line of the input there will be two integer numbers a and b separated by a single space character.
Output
Program should write a single number which is the value of a + b.

It is correct but it is highly not recommended. First of all using even all possible slots for test cases cover a small part of the possible problem instances. Secondly, the execution of each test case is time consuming (about 2s additional time for each test case).

We recommend to redesign the input / output specification in following manner:

Proper input / output design example

Input
In the first line there will be the number t which is the number of instances for the problem. In each of the next t lines there will be the pair of two numbers a and b for which you should calculate the value of a + b.
Output
For each a,b pair print the calculated sum. Separate answers with new line character.

As you can see it is possible to pack a large number of problem instance into single test case.

Note

Multiple test cases should rather be used to test different aspect of the problem.

Interactive problems

The standard schema of the submission processing assumes that at first the execution of the user’s program generates the user’s output and after that the test case judge in a certain way compares it with the model output. In the interactive problem these phases are executed simultaneously.

The execution of the user’s program starts and its standard output is directed to the test case judge as its standard input. In the same time the test case judge program starts and its standard output is directed to the user’s program as its standard input. This allows a conversation between the user’s program and the test case judge as you can see in the picture below.

Interactive problem communication diagram

Important

Interactive problem requires dedicated test case judge.

When the conversation is open, the test case judge can interactively generate input for the user’s program depending on its output and in the same time it is possible to verify the correctness of the results.

Interactive problem example

Write a program that will win the chess game with the opponent who make random legal moves.

Input
The first line of the input contains the number s which can be 0 or 1. If s = 0, then your opponent plays white pieces and if s = 1 then you plays white pieces. In the next lines there will be moves of your opponent write in the standard chess algebraic notation (for example e4 e5 means that the piece on e4 moves to e5). You can assume that all opponent moves are legal. When the match is over the final line of the input will be “win” or “lose” and the program will be accepted only if the user’s solution win the match.
Output
If s = 1 you start with the first move (i.e. you write the line with your move in the standard chess algebraic notation) and after that you write your next move after you get your opponent’s move and so on and so forth. If s = 0 you wait for your opponent’s move and then again you play move by move.

By the description of the interactive problem example you can see that we need to implement the test case judge which is able to play random chess game:

  • it remembers the state of the chess board,
  • it can make legal moves
  • it can verify that the game is over and who is the winner
  • it accepts only solutions which won the game

Attention

Due to common issues with the standard output writes we highly recommend you to clear the output buffer after printing each line. It can be done by using the fflush(stdout) command in the C/C++ languages.

Time complexity

Test cases along with time limits give a possibility of verification time complexity of algorithms. Consider the most basic case when the author knows two different algorithms for a problem, say A and B and let us assume that the algorithm A is noticeably faster than the algorithm B.

Note

It is not always easy and obvious how to preper test cases to distinguish between two algorithms.

Assuming that we have input data which is processed in the time tAfor the algorithm A which is much faster than execution time tBfor the algorithm B we can simply set the time limit somewhere between those values.

Important

The presented approach highly depends on the machine thus you need to adjust your time limit to the computing cluster rather then your local machine.

With the timeout tAt0tBwe can assume that A-like algorithms will pass the test case and B-like algorithms will fail it due to exceeding the time limit.

Caution

Presented method is not a real time complexity testing, slower algorithm can beat the faster one when it is well technically optimized for the test cases and the machine.

It is also not a universal method - changing the machine can allow slower algorithms to pass test cases designed for faster algorithms only.

The sorting problem is one of the most demonstrative example when there are many different solutions. All natural solutions need approximetly n2 operations to sort the sequence of length n. However, the more sophisticated algorithms guarantee approximately nlog(n) operations which is significantly better result.

In the problem example section you can see properly prepared test cases which distinguish solutions for The initial sum[1] problem.

Note

Obviously for problems with many (not only 2) solutions of different speeds you can construct a hierarchy of test cases to reflect the gradation of solutions in scores.

Memory complexity

Similarly to time complexity testing one can test memory complexity of algorithms. Consider the simplest situation when the author knows two different algorithms for a problem, say A and B. Let us assume that algorithm A consumes small and constant amount of memory and algorithm B memory needs are dependent on the problem input data (possibly big amounts).

You can distinguish between solutions A and B by constructing adjusted test cases. If we denote that designed test case makes algorithm A to use mA megabytes of memory and algorithm B to use mB megabytes of memory and these values are separated you can set the memory limit m0 megabytes somewhere between mA and mB.

Important

We do not directly support memory limit option due to complications with solutions written in virtual machine interpreted languages (for example Java languages family).

Due to the note above you need to approach individually to limit the memory that program can use. As we said there is no single parameter which sets memory limit. To obtain desired functionality you can construct custom master judge and limit the memory inside separetely for each programming language you allow to use for solutions.

Example

The prime number[2] problem can be solved in constant memory by looking for divisors or alternatively with Sieve of Eratosthenes algorithm which consumes the amount of memory which depends on the input number.

Statuses

There are two levels when the status is assigned to the submission:

  • test case the status is produced by the test case judge,
  • master judge the status is a combination of statuses from test cases.

The master judge is a high order level component and it can arbitrary assign any status to the submission. We are going to focus on the test case judge statuses.

We separate statuses into two groups: semantic and systemic. The semantic statuses are strictly related to the correctness of the answer to the problem. On the other hand, the systemic statuses are syntactic related and the judge gets it from the system.

Semantic statuses
  • Accepted (AC) the submission is a correct solution to the problem.
  • Wrong answer (WA) the submission is an incorrect solution.
Sytemic statuses
  • Time limit exceeded (TLE) the submission execution took too long.
  • Runtime error (RE) the error occurred during program execution.
    • NZEC (Non-Zero Exit Code) main function returned error signal (for example main function in C/C++ should return 0).
    • SIGSEGV the program accessed unallocated memory (segmentation fault).
    • SIGABRT the program received abort signal, usually programmer controls it (for example when C/C++ assert function yields false).
    • SIGFPE the floating point error, usually occurs when dividing by 0.
  • Compilation error (CE) the error occurred during compilation or syntax validation in interpreter.
  • Internal error (IE) the error occurred on the serivice side. One of the possible reasons can be poorly designed test case judge or master judge.

Note

The Internal error covers wide area of errors (including server errors) thus in the near future we will introduce another type of error for judge and master judge errors.

To ilustrate errors consider again the following example:

Example

For a positive integer n calculate the value of the sum of all positive integers that are not greater than n i.e. 1 + 2 + 3 + ... + n. For example when n = 5 then the correct answer is 15.

Input
In the first line there will be the number 1t10000000 which is the number of instances for your problem. In each of the next t lines there will be one number n for which you should calculate the described initial sum.
Output
For each n print the calculated initial sum. Separate answers with new line character.

The first error which can occur is the compilation error, for example submitting the following source code would produce the CE status:

long long initsum(long long n)
{
  return n*(n+1)/2;
}

int main()
{
  int t // missing semicolon
  long long n;
  scanf("%d", &t);
  while (t > 0)
  {
    scanf("%lld", &n);
    printf("%lld\n", initsum(n));
    t--;
  }
  return 0;
}
_images/status-appendix-ce.png

To obtain runtime error we can refer to unallocated memory:

long long initsum(long long n)
{
  return n*(n+1)/2;
}

int main()
{
  int t;
  long long n;
  scanf("%d", &t);
  while (t > 0)
  {
    scanf("%lld", n); // referring to unallocated memory
    printf("%lld\n", initsum(n));
    t--;
  }
  return 0;
}
_images/status-appendix-re.png

We will exceed time limit with worse algorithm (if test cases are rich enough):

// suboptimal algorithm
long long initsum(long long n)
{
  int i;
  long long sum = 0;
  for (i=1; i <= n; i++)
  {
    sum += i;
  }
  return sum;
}

int main()
{
  int t;
  long long n;
  scanf("%d", &t);
  while (t > 0)
  {
    scanf("%lld", &n);
    printf("%lld\n", initsum(n));
    t--;
  }
  return 0;
}
_images/status-appendix-tle.png

Bad output formatting causes wrong answer status:

long long initsum(long long n)
{
  return n*(n+1)/2;
}

int main()
{
  int t;
  long long n;
  scanf("%d", &t);
  while (t > 0)
  {
    scanf("%lld", &n);
    printf("%lld", initsum(n)); // missing new line character
    t--;
  }
  return 0;
}
_images/status-appendix-wa.png

At the end we present correct and optimal solution which passes all test cases and obtains accepted status:

long long initsum(long long n)
{
  return n*(n+1)/2;
}

int main()
{
  int t;
  long long n;
  scanf("%d", &t);
  while (t > 0)
  {
    scanf("%lld", &n);
    printf("%lld\n", initsum(n));
    t--;
  }
  return 0;
}
_images/status-appendix-acc.png

Submission information

TODO: How the information from judges flows.


Footnotes

[1]The initial sum problem is to calculate the value of 1 + 2 + 3 + ... + n for given integer n.
[2]The prime number problem is to verify if n is a prime number.