From owner-freebsd-doc@FreeBSD.ORG Mon Jan 18 07:57:18 2010 Return-Path: Delivered-To: freebsd-doc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BBD91106566C for ; Mon, 18 Jan 2010 07:57:18 +0000 (UTC) (envelope-from murray@stokely.org) Received: from mail-pw0-f44.google.com (mail-pw0-f44.google.com [209.85.160.44]) by mx1.freebsd.org (Postfix) with ESMTP id A206A8FC12 for ; Mon, 18 Jan 2010 07:57:18 +0000 (UTC) Received: by pwi15 with SMTP id 15so1692627pwi.3 for ; Sun, 17 Jan 2010 23:57:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.141.106.14 with SMTP id i14mr4055322rvm.111.1263801437306; Sun, 17 Jan 2010 23:57:17 -0800 (PST) Date: Sun, 17 Jan 2010 23:57:17 -0800 Message-ID: <2a7894eb1001172357t754cee36u760d9ddd1d6a7665@mail.gmail.com> From: Murray Stokely To: FreeBSD doc list Content-Type: text/plain; charset=ISO-8859-1 Subject: Proposed new doc hierarchy for closed-captions / transcripts from conferences X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jan 2010 07:57:18 -0000 As some of you might be aware I have been working on getting closed captions for the videos of FreeBSD related talks at conferences. In the last month I've started using the YouTube Machine Learning to produce the first automatic transcript and then paying human editors through Amazon Mechanical Turk to improve the technical vocabulary / general editing of the transcripts. There are now four videos in the BSD Conferences YouTube channel with relatively good quality human-edited english language transcripts. (e.g. pointers at http://freebsd.stokely.org/2010/01/improved-conference-captions-from.html) The caption files themselves are simple ASCII text files with one line for the start/end time of the text to be displayed, 1 or 2 lines for the text to be displayed, and a blank line to separate the next record. I would like to start checking in these text files under doc/en_US.ISO8859-1/captions/ for a number of reasons. 1. I want to make it easier for others to correct any mistakes in the captions. 2. I want to make it easier to translators to produce localized captions for the most popular videos. 3. Keep a centralized repository of the captions outside of YouTube, so other hosting sites or systems are able to use them. 4. Increase discoverability of technical content discussed in the conference talks with indexable transcripts open to search engines. The blog post above has some example text files that I'd like to check in. It then becomes a matter of choosing the hierarchy. I might suggest: doc/${LANG}/captions/${YEAR}/${CONFERENCE}/${TALK} e.g. doc/en_US.ISO8859-1/captions/2009/asiabsdcon/mckusick-kernelinternals.sbv Thoughts? - Murray