THC Science

This manual contains some suggestions which aim to contribute towards writing clear, pleasant looking, and hopefully interesting computer science articles. This guide is a complement to the general Manual of Style.

Suggested structure of a computer science article[edit]

Probably the hardest part of writing any technical article is the difficulty of addressing the level of technical knowledge on the part of the reader. A general approach is to start simple, and then move toward more formal and technical statements as the article proceeds. The following structure is merely recommended; editorial discretion and consensus might find an alternative structure more appropriate for some subjects.

Article introduction[edit]

The article should start with an lead section of length appropriate to the article content, which describes the subject in general terms and properly summarizes the article. This opening material should give the casual reader a quick understanding of the concept. Name the field(s) of computer science this concept belongs to and describe the context in which the term is used. Write the article title and any alternative names in bold. Include the historical motivation, provide some names and dates, etc. Here is an example:

In computer science, communicating sequential processes (CSP) is a formal language for describing patterns of interaction in concurrent systems. It is a member of the family of mathematical theories of concurrency known as process algebras, or process calculi, based on message passing via channels. CSP was highly influential in the design of the occam programming language and also influenced the design of programming languages such as Limbo, RaftLib, Erlang, Go, Crystal, and Clojure's core.async. CSP was first described in a 1978 article by Tony Hoare, but has since evolved substantially. CSP has been practically applied in industry as a tool for specifying and verifying the concurrent aspects of a variety of different systems, such as the T9000 Transputer, as well as a secure ecommerce system. The theory of CSP itself is also still the subject of active research, including work to increase its range of practical applicability (e.g., increasing the scale of the systems that can be tractably analyzed).

Background and application[edit]

It is a good idea to begin the main part of the article with an informal introduction, commonly titled "Background" though "Overview" has also been used, that gives the non-technical reader a basic understanding of the fundamental concepts of the topic being presented. If the concept in question is somewhat theoretical, it is helpful to describe its purpose or applied use(s) toward the top of the article, either concisely in the lead section or in a post-lead introductory section. It is also useful to have some representative examples, which could serve both to expand on the concept in question and to provide some context as to why one might want to use that concept.

A picture is a great way of bringing a point home, and often it could even precede the technical discussion of a concept. WP:How to create graphs for Wikipedia articles has some hints on how to create graphs and other pictures, and how to include them in articles.

If not included in the introductory material, a section about the history of the concept is often useful and can provide additional insight. This often forms its own "History" section.

Structuring different kinds of articles[edit]

Computer science is a broad field which encompasses a number of different kinds of ideas. The structure of the main part of an article will vary with the type of article. Here are some general guidelines for structuring a few different classes of computer science articles. Where possible, these guidelines include one or two examples of a "good" article, i.e. an article demonstrates the style we're aiming for in that particular class of article. Always keep in mind that these are guidelines, not hard-and-fast rules: some articles will need to deviate from this structure; some articles will be hard to classify; some won't fit into these classifications at all. Use common sense in applying these guidelines.

Algorithms and data structures[edit]

An article on an algorithm should generally consist of:

A description of the algorithm (including pseudocode)
A formal discussion of the algorithm's time and space complexity
A discussion of any implementation and performance issues

A good example is binary search algorithm, a featured article.

An article on a data structure should consist of:

A description of the structure, and any operations that can be performed on the structure
An overview of the applications for the structure in question
A discussion of any implementation and performance issues

Classic problems[edit]

An article describing a classic problem should generally consist of:

A statement of the problem.
A discussion of the relevance of the problem.
A discussion of the history of the problem if notable.
A description of solutions to the problem if any exist.

An example is the dining philosophers problem.

Design patterns[edit]

The structure of an article describing a software design pattern or other design pattern in computer science (including creational patterns, structural patterns, behavioral patterns, and concurrency patterns) should draw on industry best-practice models provided by reliable sources on the subject. Some of these include:

Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (the "Gang of Four" or GoT; 1994, Addison-Wesley / O'Reilly). Our article on the book at Design Patterns § Patterns by type links to a number of such design-pattern articles.
Design Patterns for Object-Oriented Software Development by Wolfgang Pree (1995, Addison-Wesley)
Various works on design patterns for specific languages and other applications are available for free reading via the Internet Archive Open Library [1], and a variety of others are hosted on free websites.

Fields of study[edit]

An article describing a field of study within computer science should include:

A brief overview of the history of the field
A basic introduction to the key concepts around which the field revolves
A description of the important principles, theorems, or results produced by the field
A discussion of relationships with other fields of study (inside and outside CS)
Some pointers to articles that describe particular aspects of the field in greater detail

Formalisms[edit]

An article describing some kind of formalism should contain:

A brief outline of the history of the formalism (originator, major developments, etc.)
An informal introduction to the basic concepts involved (preferably including some simple examples)
A formal definition
Discussion of important applications or implementations
Description of major tools that support the formalism
Brief overview of related or derived formalisms

A good example is lambda calculus.

Programming constructs[edit]

An article describing a programming construct should generally include:

A brief overview of what the construct is, and how it is used (perhaps including an informal operational semantics)
A listing of alternative names for the construct
Pseudocode or a small code sample demonstrating the construct in use
Description of any equivalences between the construct and other constructs
A discussion of any variations in the semantics of the construct
A discussion of any disadvantages to use of the construct

Examples include continuation, closure (computer programming), and exception handling.

Programming languages[edit]

An article describing a programming language should generally include at least:

A brief outline of the history of the language
An overview of the language features
- Programming paradigm(s) that the language supports, and how well it supports them
- Style of type-checking, support for design by contract or other specification techniques
- Memory management style
A basic introduction to the language syntax (including some code samples)
An overview of the formal semantics of the language (if one exists)
A list of available implementations and supported platforms

A good example is the Rust (programming language) article.

Theorems and conjectures[edit]

An article describing theorems or conjectures should generally contain at least:

A brief informal description of the theorem or conjecture.
A formal statement of the theorem or conjecture.
A discussion of the history of the theorem or conjecture.
A discussion of impacts, consequences, or implications of the theorem or conjecture.

As many computer science theorems and conjectures are often stated informally in popular literature it may also be beneficial to provide some discussion of common misconceptions or misinterpretations of the theorem or conjecture.

The names of such things are not capitalized except where they contain a proper name (e.g. a surname).

Good examples include halting problem and Church–Turing thesis.

Tools[edit]

An article describing a class of tools should generally contain:

A brief overview of the purpose of the tool
A brief history of the development of the class of tool
Discussion of any major subclasses of the tool class
Brief description of the key underlying algorithms or implementation techniques

Examples include compiler, text editor, and automated theorem proving.

Concluding matters[edit]

See WP:Manual of Style/Layout § Standard appendices and footers for use of "See also", "Notes", "References", "External links" sections, and navigation templates.

Style guidelines[edit]

WP:Manual of Style/Mathematics has some good advice on how to discuss more theoretical topics, as well as when and how to include proofs of important theorems. It also includes guidance on how to typeset equations, logical expressions, and other mathematical notation.
WP:Manual of Style/Dates and numbers § Non-base-10 notations gives advice on formatting e.g. binary or hexadecimal numbers and memory addresses.

Code samples[edit]

Shortcuts

Samples of actual sources get included in articles for a variety of reasons, although the most typical reasons are to demonstrate the "look" of a particular language, to provide examples of language-specific constructs or features, and to provide examples of algorithms not easily expressed in pseudocode. While there's nothing inherently wrong with including sample code, excessive amounts of it can detract from the content of the article itself; avoid writing sample code unless it contributes significantly to a fundamental understanding of the encyclopedic content.

Wikipedia is not a source code repository. Code that is not relevant to encyclopedic content should be moved out of Wikipedia. WikiBooks is the appropriate Wikimedia project for existing GFDL-compatible code; in particular Wikibooks:Algorithm Implementation. The external LiteratePrograms and Rosetta Code wikis are appropriate places to put new sample implementations, along with descriptions of how those implementations work. Important note: existing implementations on Wikipedia cannot be transferred to the LiteratePrograms wiki because Wikipedia content is licensed under the GFDL and/or CC-BY-SA, while LiteratePrograms uses the more liberal MIT/X11 license; relicensing existing code from GFDL/CC-BY-SA to MIT/X11 would require the permission of all copyright holders

Some general guidelines on code samples:

Sample implementations of algorithms are fine, but every algorithm article should include a pseudocode description of the core algorithm when possible, so that anyone can understand how the algorithm works. See below for guidelines on pseudocode.
The sample should use a language that clearly illustrates the algorithm to a reader who is relatively unfamiliar with the language— even if you believe that the language is well-known. To remain language-neutral, choose languages based on clarity, not popularity. Languages that emphasize readability, such as Python and Ruby, are good choices. Avoid esoteric or language-specific operations.
Source code implementations must be compatible with the GFDL and CC-BY-SA licenses (which are incompatible with the GPL).
Multiple source code implementations are not appropriate unless they contrast specific aspects of the code and that contrast is important to the encyclopedic content of the article. If possible, accentuate differences by providing the alternate implementation in the same language as the original.
All code samples should be marked up using one of the following methods:
- Surrounding a short, inline code sample in <code>...</code> or <syntaxhighlight lang="x" inline>
- Surrounding a block of code in <syntaxhighlight lang="x"> or <pre>...</pre> tags
- Indenting each line of a block of code by one or more spaces (unlike the above method, this allows you to use basic text formatting like italic, bold, etc. in the sample) This typesets them in a monospaced typeface to ensure that spacing is preserved and provides additional information to screen readers and users with customized CSS. Doing this is particularly important for languages where whitespace has syntactic significance— ideally we'd like people to be able to copy and paste sample code into a text editor or IDE. For example,

int main(void) {
    printf("hello, world\n");
    return 0;
}

Syntax highlighting may be obtained by using SyntaxHighlight MediaWiki tag instead of <pre>...</pre>, with a specific lang attribute specifying the language name (the value wikitext is used for MediaWiki markup). See Extension:SyntaxHighlight for supported languages. The following is syntax-highlighted Java code, for example, using <syntaxhighlight lang="java">:

class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World!"); // Print "Hello World!"
    }
}

Since code listings are not line-wrapped, and many people read Wikipedia in space-constrained environments, ensure that code listings have a maximum line length of 60 chars. Apply manual word wrap if necessary.

Algorithms[edit]

Shortcuts

There are no universally accepted standards for presenting algorithms on Wikipedia. A past attempt at standardized pseudocode is archived at User:Dcoetzee/Wikicode, though "[t]he author advises that such a proposal not be advanced again, as it is unlikely to gain consent". Within WikiProject Computer science, the consensus has generally been that where possible, algorithms should be presented in pseudocode. The use of pseudocode is completely language agnostic, and is more NPOV with respect to programming languages in general. Pseudocode also provides far more flexibility with regard to the level of implementation detail, allowing algorithms to be presented at however high a level is required to focus on the algorithm and its core ideas, rather than the details of how it is implemented. Finally, suitably high-level pseudocode provides the most accessible presentation of an algorithm, particularly for non-programmers.

Example[edit]

algorithm ford-fulkerson is
    input: Graph G with flow capacity c, 
           source node s, 
           sink node t
    output: Flow f such that f is maximal from s to t

    (Note that f_(u,v) is the flow from node u to node v, and c_(u,v) is the flow capacity from node u to node v)

    for each edge (u, v) in G_E do
        f_{(u, v)} ← 0
        f_{(v, u)} ← 0

    while there exists a path p from s to t in the residual network G_f do
        let c_f be the flow capacity of the residual network G_f
        c_f(p) ← min{c_f(u, v) | (u, v) in p}
        for each edge (u, v) in p do
            f_{(u, v)} ←  f_{(u, v)} + c_f(p)
            f_{(v, u)} ← −f_{(u, v)}

    return f

General guidelines for writing pseudocode[edit]

Make sure the algorithm is understandable without having to read this page first.
Try to focus on outlining the algorithm, and where possible keep discussion and explanation of the algorithm outside of the pseudocode.
All assignment, comparison, and other mathematical operators should be rendered with proper mathematical symbols wherever possible:

Symbols for use in pseudocode
Operator	Result	Entity	Example
Assignment	← or :=	←, :=	`c ← 2πr`, `c := 2πr`
Comparison	=, ≠, <, >, ≤, ≥	=, ≠, <, >, ≤, ≥
Arithmetic	+, −, ×, /, mod	+, −, ×, /, mod
Floor/ceiling	⌊, ⌋, ⌈, ⌉	&lfloor;, &rfloor;, &lceil;, &rceil;	`a ← ⌊b⌋ + ⌈c⌉`
Logical	and, or	'''and''', '''or'''	`a ≥ b and a ≤ c`
Sums	∑ ∏	∑ ∏	`h ← ∑_a∈A 1/a`

Low-level pseudocode[edit]

All control structure keywords should be bolded, and comments should be in italics (in addition to whatever other manner for denoting comments is used).
Try to keep the algorithm description sufficiently high level so as to avoid most implementation specific details.
Try to avoid using structures or techniques that are idiomatic to a particular language or programming paradigm.
If idiomatic structures or techniques are unavoidable try to present them in a generic high-level way in line with the style outlined below:
The preferred conditional structure is

if condition then
    code path
else if condition then
    code path
else
    code path
end if

The preferred looping constructs are

while condition do
    code block
repeat

and

for loop_control do
    code block
repeat

where loop_control is any suitable clause to control a for loop, such as item in list or 1 ≤ i ≤ n etc.

The preferred function definition structure is

function function_name is
    code block
    return variable
end function

where function_name is any reasonable format of function name and arguments. Alternatively, inputs and outputs can be specified within the function block:

function function_name is
    input: description of inputs
    output: description of outputs
    code block
    return variable
end function

Code blocks should be indented. If sufficiently clear, block closing keywords (end, repeat etc.) may be omitted.

An example of pseudocode roughly hewing to these guidelines is provided as the example on the Algorithm page, and in Bucket sort, Topological sort, and Pollard's rho algorithm.

High-level pseudocode[edit]

Algorithms that are most easily presented at a very high level, such as Buchberger's algorithm or the Pohlig–Hellman algorithm should be presented in the format

Inputs description of input arguments

Output description of outputs

(description of a step in the algorithm)
(the next step in the algorithm)
1. (substep)
(etc.)

using markup:

:'''Inputs''' ''description of input arguments''
:'''Output''' ''description of outputs''
:# ''(description of a step in the algorithm)''
:# ''(the next step in the algorithm)''
:## ''(substep)''
:# ''(etc.)''

Descriptions of steps should be high level, and may simply be English sentences.
Substeps of the algorithm, due to branch conditions or loop structures should be indented and subnumbered.
Termination of the algorithm with a return value should be denoted using the keyword return

Examples of algorithms in this format include Buchberger's algorithm, Pohlig–Hellman algorithm, Itoh–Tsujii inversion algorithm, Baby-step giant-step, and Pollard's p − 1 algorithm.

Including references[edit]

It is essential for article content to be verifiable with reliable sources, a well-chosen list of references and pointers to the literature. Any quotation, any material challenged or likely to be challenged, and any claim that involves a living person must be supported by an inline citation to a reliable secondary source. Some additional reasons for citing high-quality sources are the following:

Providing proper source citations enables other editors and our readers to verify the given information and evaluate the sources. This is especially important in any science topic, since the state of the art is always advancing.
A reader may desire more details, but Wikipedia is not a substitute for a textbook (that is what Wikibooks is for). Many research papers and books of benefit to topical newcomers are now freely available online.
Some notions are defined differently depending on context or author. Articles require references that support the given usage, and should identify conflicting usages (giving due weight to the sources for them).
Important algorithms, theorems, or definitions should cite the concept-originating papers as historical and technical information, in addition to later secondary reference works relied on for modern, applied context.

Primary sources must be used with care. They are generally permissible for non-controversial and non-promotional claims about the subject or its author(s). Examples: referencing a blog post from the Rust programming language development team is sufficient for a point about the internal architecture of the Rust compiler, or the motivation behind a specific decision about the Rust language design. However, such a source cannot be used for providing Rust benchmarks, or contrasting Rust's features versus those of another language, since such claims from the developers may have a promotional element and involve analysis, evaluation, interpretation, or synthesis of facts and evidence, which requires a secondary source.

Wikipedia does not enforce a specific reference and citation style; just use one consistently within a particular article.

See WP:Citing sources for how to cite sources properly.