The core technology used in a videoteleconference (VTC) system is digital compression of audio and video streams in real time. The hardware or software that performs compression is called a codec (coder/decoder). Compression rates of up to 1:500 can be achieved. The resulting digital stream of 1's and 0's is subdivided into labelled packets, which are then transmitted through a digital network of some kind (usually ISDN or IP). The use of audio modems in the transmission line allow for the use of POTS, or the Plain Old Telephone System, in some low-speed applications, such as videotelephony, because they convert the digital pulses to/from analog waves in the audio spectrum range.

The other components required for a VTC system include:

* Video input : video camera or webcam
* Video output: computer monitor , television or projector
* Audio input: microphones
* Audio output: usually loudspeakers associated with the display device or telephone
* Data transfer: analog or digital telephone network, LAN or Internet

There are basically two kinds of VTC systems:

1. Dedicated systems have all required components packaged into a single piece of equipment, usually a console with a high quality remote controlled video camera. These cameras can be controlled at a distance to pan left and right, tilt up and down, and zoom. They became known as PTZ cameras. The console contains all electrical interfaces, the control computer, and the software or hardware-based codec. Omnidirectional microphones are connected to the console, as well as a TV monitor with loudspeakers and/or a video projector. There are several types of dedicated VTC devices:
1. Large group VTC are non-portable, large, more expensive devices used for large rooms and auditoriums.
2. Small group VTC are non-portable or portable, smaller, less expensive devices used for small meeting rooms.
3. Individual VTC are usually portable devices, meant for single users, have fixed cameras, microphones and loudspeakers integrated into the console.
2. Desktop systems are add-ons (hardware boards, usually) to normal PC's, transforming them into VTC devices. A range of different cameras and microphones can be used with the board, which contains the necessary codec and transmission interfaces. Most of the desktops systems work with the H.323 standard. Video conferences carried out via dispersed PCs are also known as e-meetings.

A fundamental feature of professional VTC systems is acoustic echo cancellation (AEC). AEC is an algorithm which is able to detect when sounds or utterences reenter the audio input of the VTC codec, which came from the audio output of the same system, after some time delay. If unchecked, this can lead to several problems including 1) the remote party hearing their own voice coming back at them (usually significantly delayed) 2) strong reverberation, rendering the voice channel useless as it becomes hard to understand and 3) howling created by feedback. Echo cancellation is a processor-intensive task that usually works over a narrow range of sound delays.

Simultaneous videoconferencing among three or more remote points is possible by means of a Multipoint Control Unit (MCU). This is a bridge that interconnects calls from several sources (in a similar way to the audio conference call). All parties call the MCU unit, or the MCU unit can also call the parties which are going to participate, in sequence. There are MCU bridges for IP and ISDN-based videoconferencing. There are MCUs which are pure software, and others which are a combination of hardware and software. An MCU is characterised according to the number of simultaneous calls it can handle, its ability to conduct transposing of data rates and protocols, and features such as Continuous Presence, in which multiple parties can be seen onscreen at once.

MCUs can be stand-alone hardware devices, or they can be embedded into dedicated VTC units.

Some systems are capable of multipoint conferencing with no MCU, stand-alone, embedded or otherwise. These use a standards-based H.323 technique known as "decentralized multipoint", where each station in a multipoint call exchanges video and audio directly with the other stations with no central "manager" or other bottleneck. The advantages of this technique are that the video and audio will generally be of higher quality because they don't have to be relayed through a central point. Also, users can make ad-hoc multipoint calls without any concern for the availability or control of an MCU. This added convenience and quality comes at the expense of some increased network bandwidth, because every station must transmit to every other station directly.

Some observers [1] argue that two outstanding issues are preventing videoconferencing from becoming a standard form of communication, despite the ubiquity of videoconferencing-capable systems. These issues are:

1. Eye Contact: It is known that eye contact plays a large role in conversational turn-taking, perceived attention and intent, and other aspects of group communication [2]. While traditional telephone conversations give no eye contact cues, videoconferencing systems are arguably worse in that they provide an incorrect impression that the remote interlocutor is avoiding eye contact. This issue is being addressed though research that generates a synthetic image with eye contact using stereo reconstruction [3] .
2. Appearance Consciousness: A second problem with videoconferencing is that one is literally on camera, with the video stream possibly even being recorded. The burden of presenting an acceptable on-screen appearance is not present in audio-only communication. Early studies by Alphonse Chapanis found that the addition of video actually impaired communication, possibly because of the consciousness of being on camera.

The issue of eye-contact may be solved with advancing technology, and presumably the issue of appearance consciousness will fade as people become accustomed to videoconferencing.

