The Phraselator is a weatherproof handheld language translation device developed by Applied Data Systems and VoxTec, a former division of the military contractor Marine Acoustics, located in Annapolis, Maryland, USA. It was designed to serve as a handheld computer device that translates English into one of 40 different languages. == The device == The Phraselator is a small speech translation PDA-sized device designed to aid in interpretation. The device does not produce synthesized speech like that utilized by Stephen Hawking; instead, it plays pre-recorded foreign language MP3 files. Users can select the phrase they wish to convey from an English list on the screen or speak into the device. It then uses speech recognition technology called DynaSpeak, developed by SRI International, to play the proper sound file. The accuracy of the speech recognition software is over 70 percent according to software developer Jack Buchanan. The device can also record replies for translation later. Pre-recorded phrases are stored on Secure Digital flash memory cards. A 128 MB card can hold up to 12,000 phrases in four or five languages. Users can download phrase modules from the official website, which contained over 300,000 phrases as of March 2005. Users can also construct their own custom phrase modules. Earlier devices were known to have run on an SA-1110 Strong Arm 206 MHz CPU with 32MB SDRAM and 32MB onboard Flash RAM. A newer model, the P2, was released in 2004 and developed according to feedback from U.S. soldiers. It translates one way from English to approximately 60 other languages. It has a directional microphone, a larger library of phrases and a longer battery life. The 2004 release was created by and utilizes a computer board manufactured by InHand Electronics, Inc. In the future, the device will be able to display pictures so users can ask questions such as "Have you seen this person?" Developer Ace Sarich notes that the device is inferior to human interpreter. Conclusions derived from a Nepal field test conducted by U.S. and Nepal based NGO Himalayan Aid in 2004 seemed to confirm Sarich's comparisons: The very concept of using a machine as a communication point between individuals seemed to actually encourage a more limited form of interaction between tester and respondent. Usually, when limited language skills are present between parties, the genuine struggle and desire to communicate acts as a display of good will – we openly display our weakness in this regard – and the result is a more relaxed and human encounter. This was not necessarily present with the Phraselator as all parties abandoned learning about each other and instead focused on learning how to work with the device. As a tool for bridging any cultural differences or communicating effectively at any length, the Phraselator would not be recommended. This device, at least in the form tested, would best be used in large-scale operations where there is no time for language training and there is a need to communicate fixed ideas, quickly, over the greatest distance by employing large amounts of unskilled users. Large humanitarian or natural disasters in remote areas of third-world countries might be an effective example. == Origin == The original idea for the device came from Lee Morin, a Navy doctor in Operation Desert Storm. To communicate with patients, he played Arabic audio files from his laptop. He informed Ace Sarich, the vice president of VoxTec, about the idea. VoxTec won a DARPA Small Business Innovation Research grant in early 2001 to develop a military-grade handheld phrase translator. During its development, the Phraselator was tested and evaluated by scientists from the Army Research Laboratory. The device was first field tested in Afghanistan in 2001. By 2002, about 500 Phraselators were built for soldiers around the world with another 250 ordered by the U.S. Special Forces. The device cost $2000 to develop and could convert spoken English into one of 200,000 recorded commands and questions in 30 languages. However, the device could only translate one-way. At the time, the only existing two-way voice translator that could convert speech back and forth between languages was the Audio Voice Translation Guide System, or TONGUES, which was developed by Carnegie Mellon University for Lockheed Martin. As part of a DARPA program known as the Spoken Language Communication and Translation System for Tactical Use, SRI International has further developed two-way translation software for use in Iraq called IraqComm in 2006 which contains a vocabulary of 40,000 English words and 50,000 words in Iraqi Arabic. == Notable users == The handheld translator was recently used by U.S. troops while providing relief to tsunami victims in early 2005. About 500 prototypes of the device were provided to U.S. military forces in Operation Enduring Freedom. Units loaded with Haitian dialects have been provided to U.S. troops in Haiti. Army military police have used it in Kandahar to communicate with POWs. In late 2004, the U.S. Navy began to augment some ships with a version of the device attached to large speakers in order to broadcast clear voice instructions up to 400 yards (370 m) away. Corrections officers and law enforcement in Oneida County, New York, have tested the device. Hospital emergency rooms and health departments have also evaluated it. Several Native American tribes such as the Choctaw Nation, the Ponca, and the Comanche Nation have also used the device to preserve their dying languages. Various law enforcement agencies, such as the Los Angeles Police Department, also use the phraselator in their patrol cars. == Awards == In March 2004, DARPA director Dr. Tony Tether presented the Small Business Innovative Research Award to the VoxTec division of Marine Acoustics at DARPATech 2004 in Anaheim, CA. The device was recently listed as one of "Ten Emerging Technologies That Will Change Your World" in MIT's Technology Review. == Pop culture == Software developer Jack Buchanan believes that building a device similar to the fictional universal translator seen in Star Trek would be harder than building the Enterprise. The device was mentioned in a list of "Top 10 Star Trek Tech" on Space.com.
Film-out
Film-out is the process in the computer graphics, video production and filmmaking disciplines of transferring images or animation from videotape or digital files to a traditional film print. Film-out is a broad term that encompasses the conversion of frame rates, color correction, as well as the actual printing, also called scannior recording. The film-out process is different depending on the regional standard of the master videotape in question – NTSC, PAL, or SECAM – or likewise on the several emerging region-independent formats of high definition video (HD video); thus each type is covered separately, taking into account regional film-out industries, methods and technical considerations. == Live action video == Many modern documentaries and low-budget films are shot on videotape or other digital video media, instead of film stock, and completed as digital video. Video production means substantially lower costs than 16 mm or 35 mm film production on all levels. Until recently, the relatively low cost of video ended when the issue of a theatrical presentation was raised, which required a print for film projection. With the growing presence of digital projection, this is becoming less of a factor. === Standard definition (SD) video === Film-out of standard-definition video – or any source that has an incompatible frame rate – is the up-conversion of video media to film for theatrical viewing. The video-to-film conversion process consists of two major steps: first, the conversion of video into digital film frames which are then stored on a computer or on HD videotape; and secondly, the printing of these digital film frames onto actual film. To understand these two steps, it is important to understand how video and film differ. Film (sound film, at least) has remained unchanged for almost a century and creates the illusion of moving images through the rapid projection of still images, frames, upon a screen, typically 24 per second. Traditional interlaced SD video has no real frame rate, (though the term frame is applied to video, it has a different meaning). Instead, video consists of a very fast succession of horizontal lines that continually cascade down the television screen – streaming top to bottom, before jumping back to the top and then streaming down to the bottom again, repeatedly, almost 60 alternating screen-fulls every second for NTSC, or exactly 50 such screen-fulls per second for PAL and SECAM. Since visual movement in video is infused in this continuous cascade of scan lines, there is no discrete image or real frame that can be identified at any one time. Therefore, when transferring video to film, it is necessary to invent individual film frames, 24 for every second of elapsed time. The bulk of the work done by a film-out company is this first step, creating film frames out of the stream of interlaced video. Each company employs its own (often proprietary) technology for turning interlaced video into high-resolution digital video files of 24 discrete images every second, called 24 progressive video or 24p. The technology must filter out all the visually unappealing artifacting that results from the inherent mismatch between video and film movement. Moreover, the conversion process usually requires human intervention at every edit point of a video program, so that each type of scene can be calibrated for maximum visual quality. The use of archival footage in video especially calls for extra attention. Step two, the scanning to film, is the rote part of the process. This is the mechanical step where lasers print each of the newly created frames of the 24p video, stored on computer files or HD videotape, onto rolls of film. Most companies that do film-out, do all the stages of the process themselves for a lump sum. The job includes converting interlaced video into 24p and often a color correction session – (calibrating the image for theatrical projection), before scanning to physical film, (possibly followed by color correction of the film print made from the digital intermediary) – is offered. At the very least, film-out can be understood as the process of converting interlaced video to 24p and then scanning it to film. ==== NTSC video ==== NTSC is the most challenging of the formats when it comes to standards conversion and, specifically, converting to film prints. NTSC runs at the approximate rate of 29.97 video frames (consisting of two interlaced screen-fulls of scan lines, called fields, per frame) per second. In this way, NTSC resolves actual live action movement at almost – but not quite – 60 alternating half-resolution images every second. Because of this 29.97 rate, no direct correlation to film frames at 24 frames per second can be achieved. NTSC is hardest to reconcile with film, thus motivating its own unique processes. ==== PAL and SECAM video ==== PAL and SECAM run at 25 interlaced video frames per second, which can be slowed down or frame-dropped, then deinterlaced, to correlate frame for frame with film running at 24 actual frames per second. PAL and SECAM are less complex and demanding than NTSC for film-out. PAL and SECAM conversions do agitate, though, with the unpleasant choice between slowing down video (and audio pitch, noticeably) by four percent, from 25 to 24 frames per second, in order to maintain a 1:1 frame match, slightly changing the rhythm and feel of the program; or maintaining original speed by periodically dropping frames, thereby creating jerkiness and possible loss of vital detail in fast-moving action or precise edits. === High definition (HD) digital video === High definition digital video can be shot at a variety of frame rates, including 29.97 interlaced (like NTSC) or progressive; or 25 interlaced (like PAL) or progressive; or even 24-progressive (just like film). HD, if shot in 24-progressive, scans nearly perfectly to film without the need for a frame or field conversion process. Other issues remain though, based on the different resolutions, color spaces, and compression schemes that exist in the high-definition video world. == Computer graphics and animation == Artists working with CGI-Computer-generated imagery animation computers create pictures frame by frame. Once the finished product is done, the frames are outputted, normally in a DPX file. These picture data files can then be put on to film using a film recorder for film out. SGI computers started the high-end CGI-Computer-generated imagery animation systems, but with faster computers and the growth of Linux-based systems, many others are on the market now. Movies fully rendered and animated in CGI such as Toy Story, and Antz utilize the film-out method to produce 35mm copies for archival and release prints. Most CGI work is done in 2K Display resolution files (about the size of QXGA) and then output to the Film-out device for creation of 35 mm elements. With 4K Display resolution digital intermediates on the rise, newer types of film-out recorders are being developed to accept 4k resolution files. A 2K movie requires a Storage Area Network storage several terabytes in size to be properly stored and played out. Computer graphics files are handled the same way but in single frames and may use DPX, TIFF or other file formats. == Digital intermediates == Film-out-recording is the last step of digital intermediate workflow. DPX files that were scanned on a motion picture film scanner are stored on a storage area network (often abbreviated as SAN). The scanned DPX footage is edited and composited-FX on workstations, then mastered back on film. Film restoration is also done this way. A "film intermediate" is an analog variation of a digital intermediate, where a project shot on digital video is printed onto film stock and transferred back to digital video to emulate film. The term was coined after it was used on the Oscar-winning 2012 short film "Curfew". The process was also used on the films Dune (2021) and The Batman (2022). == Images for graphic design and print industries == The days of newspapers and magazines shooting 35mm film are almost gone. Digital cameras can now shoot all the images needed, storing them as files (e.g. JPEG, DPX or another format) that are readily edited prior to use. Once the final copy is approved, it can be filmed out for publishing. Digital stills are not the only way to get pictures used in the graphic design and print industries. Film scanners and computer graphics programs are also common sources for graphic design and print industries. == Types of devices == The following devices are used in film-out processes: CRT recorder. Camera and a special TV display Kinescope – early type Electronic Video Recording or EVR – early type EBR Electron Beam Film Recorder 16 mm by 3M Laser film recorder, like Kodak's high-end Lightning II recorder and Arri's Arrilaser. DLP Film recorder, like Cinevation's real-time Cinevator. == History == Lately it has become possible to transfer video images, inclu
Human-based evolutionary computation
Human-based evolutionary computation (HBEC) is a set of evolutionary computation techniques that rely on human innovation. == Classes and examples == Human-based evolutionary computation techniques can be classified into three more specific classes analogous to ones in evolutionary computation. There are three basic types of innovation: initialization, mutation, and recombination. Here is a table illustrating which type of human innovation are supported in different classes of HBEC: All these three classes also have to implement selection, performed either by humans or by computers. === Human-based selection strategy === Human-based selection strategy is a simplest human-based evolutionary computation procedure. It is used heavily today by websites outsourcing collection and selection of the content to humans (user-contributed content). Viewed as evolutionary computation, their mechanism supports two operations: initialization (when a user adds a new item) and selection (when a user expresses preference among items). The website software aggregates the preferences to compute the fitness of items so that it can promote the fittest items and discard the worst ones. Several methods of human-based selection were analytically compared in studies by Kosorukoff and Gentry. Because the concept seems too simple, most of the websites implementing the idea can't avoid the common pitfall: informational cascade in soliciting human preference. For example, digg-style implementations, pervasive on the web, heavily bias subsequent human evaluations by prior ones by showing how many votes the items already have. This makes the aggregated evaluation depend on a very small initial sample of rarely independent evaluations. This encourages many people to game the system that might add to digg's popularity but detract from the quality of the featured results. It is too easy to submit evaluation in digg-style system based only on the content title, without reading the actual content supposed to be evaluated. A better example of a human-based selection system is Stumbleupon. In Stumbleupon, users first experience the content (stumble upon it), and can then submit their preference by pressing a thumb-up or thumb-down button. Because the user doesn't see the number of votes given to the site by previous users, Stumbleupon can collect a relatively unbiased set of user preferences, and thus evaluate content much more precisely. === Human-based evolution strategy === In this context and maybe generally, the Wikipedia software is the best illustration of a working human-based evolution strategy wherein the (targeted) evolution of any given page comprises the fine tuning of the knowledge base of such information that relates to that page. Traditional evolution strategy has three operators: initialization, mutation, and selection. In the case of Wikipedia, the initialization operator is page creation, the mutation operator is incremental page editing. The selection operator is less salient. It is provided by the revision history and the ability to select among all previous revisions via a revert operation. If the page is vandalised and no longer a good fit to its title, a reader can easily go to the revision history and select one of the previous revisions that fits best (hopefully, the previous one). This selection feature is crucial to the success of the Wikipedia. An interesting fact is that the original wiki software was created in 1995, but it took at least another six years for large wiki-based collaborative projects to appear. Why did it take so long? One explanation is that the original wiki software lacked a selection operation and hence couldn't effectively support content evolution. The addition of revision history and the rise of large wiki-supported communities coincide in time. From an evolutionary computation point of view, this is not surprising: without a selection operation the content would undergo an aimless genetic drift and would unlikely to be useful to anyone. That is what many people expected from Wikipedia at its inception. However, with a selection operation, the utility of content has a tendency to improve over time as beneficial changes accumulate. This is what actually happens on a large scale in Wikipedia. === Human-based genetic algorithm === Human-based genetic algorithm (HBGA) provides means for human-based recombination operation (a distinctive feature of genetic algorithms). Recombination operator brings together highly fit parts of different solutions that evolved independently. This makes the evolutionary process more efficient.
Imagen (text-to-image model)
Imagen is a series of text-to-image models developed by Google DeepMind. They were developed by Google Brain until the company's merger with DeepMind in April 2023. Imagen is primarily used to generate images from text prompts, similar to Stability AI's Stable Diffusion, OpenAI's DALL-E, or Midjourney. The original version of the model was first discussed in a paper from May 2022. The tool produces high-quality images and is available to all users with a Google account through services including Gemini, ImageFX, and Vertex AI. == History == Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language. The second version, Imagen 2 was released in December 2023. The standout feature was text and logo generation. Imagen 3 was released in August 2024. Google claims that the newest version provides better detail and lighting on generated images. On 20 May 2025 at Google I/O 2025 the company released an improved model, Imagen 4. == Technology == Imagen uses two key technologies. The first is the use of transformer-based large language models, notably T5, to understand text and subsequently encode text for image synthesis. The second is the use of cascaded diffusion models providing high-fidelity image generation. Imagen generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024. Imagen 4 generates image up to 2k. == Capabilities == Imagen can generate photorealistic images from text prompts. It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. Like most text-to-image generative AI models, Imagen has difficulty rendering human fingers, text, ambigrams and other forms of typography. The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.
ECML PKDD
ECML PKDD, the European Conference on Machine Learning Principles and Practice of Knowledge Discovery in Databases, is one of the leading academic conferences on machine learning and knowledge discovery, held in Europe every year. == History == ECML PKDD is a merger of two European conferences, European Conference on Machine Learning (ECML) and European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). ECML and PKDD have been co-located since 2001; however, both ECML and PKDD retained their own identity until 2007. For example, the 2007 conference was known as "the 18th European Conference on Machine Learning (ECML) and the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD)", or in brief, "ECML/PKDD 2007", and both ECML and PKDD had their own conference proceedings. In 2008 the conferences were merged into one conference, and the division into traditional ECML topics and traditional PKDD topics was removed. The history of ECML dates back to 1986, when the European Working Session on Learning was first held. In 1993 the name of the conference was changed to European Conference on Machine Learning. PKDD was first organised in 1997. Originally PKDD stood for the European Symposium on Principles of Data Mining and Knowledge Discovery from Databases. The name European Conference on Principles and Practice of Knowledge Discovery in Databases was used since 1999. The conference remains highly competitive, consistently maintaining an average acceptance rate of around 25% for the main research track. == Upcoming conferences == == List of past conferences ==
You Only Look Once
You Only Look Once (YOLO) is a series of real-time object detection systems based on convolutional neural networks. First introduced by Joseph Redmon et al. in 2015, YOLO has undergone several iterations and improvements, becoming one of the most popular object detection frameworks. The name "You Only Look Once" refers to the fact that the algorithm requires only one forward propagation pass through the neural network to make predictions, unlike previous region proposal-based techniques like R-CNN that require thousands for a single image. == Overview == Compared to previous methods like R-CNN and OverFeat, instead of applying the model to an image at multiple locations and scales, YOLO applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. === OverFeat === OverFeat was an early influential model for simultaneous object classification and localization. Its architecture is as follows: Train a neural network for image classification only ("classification-trained network"). This could be one like the AlexNet. The last layer of the trained network is removed, and for every possible object class, initialize a network module at the last layer ("regression network"). The base network has its parameters frozen. The regression network is trained to predict the ( x , y ) {\displaystyle (x,y)} coordinates of two corners of the object's bounding box. During inference time, the classification-trained network is run over the same image over many different zoom levels and croppings. For each, it outputs a class label and a probability for that class label. Each output is then processed by the regression network of the corresponding class. This results in thousands of bounding boxes with class labels and probability. These boxes are merged until only one single box with a single class label remains. == Versions == There are two parts to the YOLO series. The original part contained YOLOv1, v2, and v3, all released on a website maintained by Joseph Redmon. === YOLOv1 === The original YOLO algorithm, introduced in 2015, divides the image into an S × S {\displaystyle S\times S} grid of cells. If the center of an object's bounding box falls into a grid cell, that cell is said to "contain" that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the box is that it predicts. In more detail, the network performs the same convolutional operation over each of the S 2 {\displaystyle S^{2}} patches. The output of the network on each patch is a tuple as follows: ( p 1 , … , p C , c 1 , x 1 , y 1 , w 1 , h 1 , … , c B , x B , y B , w B , h B ) {\displaystyle (p_{1},\dots ,p_{C},c_{1},x_{1},y_{1},w_{1},h_{1},\dots ,c_{B},x_{B},y_{B},w_{B},h_{B})} where p i {\displaystyle p_{i}} is the conditional probability that the cell contains an object of class i {\displaystyle i} , conditional on the cell containing at least one object. x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are the center coordinates, width, and height of the j {\displaystyle j} -th predicted bounding box that is centered in the cell. Multiple bounding boxes are predicted to allow each prediction to specialize in one kind of bounding box. For example, slender objects might be predicted by j = 2 {\displaystyle j=2} while stout objects might be predicted by j = 1 {\displaystyle j=1} . c j {\displaystyle c_{j}} is the predicted intersection over union (IoU) of each bounding box with its corresponding ground truth. The network architecture has 24 convolutional layers followed by 2 fully connected layers. During training, for each cell, if it contains a ground truth bounding box, then only the predicted bounding boxes with the highest IoU with the ground truth bounding boxes is used for gradient descent. Concretely, let j {\displaystyle j} be that predicted bounding box, and let i {\displaystyle i} be the ground truth class label, then x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are trained by gradient descent to approach the ground truth, p i {\displaystyle p_{i}} is trained towards 1 {\displaystyle 1} , other p i ′ {\displaystyle p_{i'}} are trained towards zero. If a cell contains no ground truth, then only c 1 , c 2 , … , c B {\displaystyle c_{1},c_{2},\dots ,c_{B}} are trained by gradient descent to approach zero. === YOLOv2 === Released in 2016, YOLOv2 (also known as YOLO9000) improved upon the original model by incorporating batch normalization, a higher resolution classifier, and using anchor boxes to predict bounding boxes. It could detect over 9000 object categories. It was also released on GitHub under the Apache 2.0 license. === YOLOv3 === YOLOv3, introduced in 2018, contained only "incremental" improvements, including the use of a more complex backbone network, multiple scales for detection, and a more sophisticated loss function. === YOLOv4 and beyond === Subsequent versions of YOLO (v4, v5, etc.) have been developed by different researchers, further improving performance and introducing new features. These versions are not officially associated with the original YOLO authors but build upon their work. As of 2026, versions up to YOLO26 have been released..
The Raimones
The Raimones (stylized as THE RAiMONES) is a 2017 generative music project that utilized artificial intelligence to compose music in the style of the American punk rock band The Ramones. Developed by Matthias Frey, a researcher at Sony CSL Tokyo, the project was an early experiment in applying deep learning to high-energy, minimalist musical genres. == Technical Development == The project utilized Long short-term memory (LSTM) recurrent neural networks to generate musical structures and lyrics. The model was trained on a dataset consisting of 130 Ramones songs in MIDI format and the band's complete lyrical catalog. The technical framework was built using Python and Jupyter Notebook, drawing influence from the character-level RNN text generation models popularized by Andrej Karpathy. Unlike contemporary AI music projects that focused on the harmonic complexities of classical or pop music, THE RAiMONES sought to determine if neural networks could replicate the "1-2-3-4" rhythmic consistency and formulaic nature of early punk. == "I'm Alive" == The primary output of the project was the song "I'm Alive," released in 2017. The work is described as a form of "augmented intelligence," a hybrid approach where the AI provides the compositional foundation and human musicians handle the arrangement and performance. The song was recorded by the musician Mr. Ratboy (Gilbert Avondet). Avondet's involvement provided a stylistic link to the subject material, as he had previously served as a touring guitarist for Marky Ramone and the Intruders in 1996. The project's discography has since been made available on major streaming platforms, including Apple Music. == Reception and Significance == The project has been cited as a "proof of concept" for AI's ability to tackle "noisy" and aggressive aesthetics. In 2019, the Belgian magazine Knack Focus profiled the project alongside other AI pioneers such as Holly Herndon, noting the project's attempt to recreate the sound of "deceased legends" while maintaining a distinct, machine-like quality. It has also been featured in academic settings, such as at UC Santa Cruz, as a case study for AI-driven genre mimicry.