Article de revue avec comité de lecture (11)
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
Wearable assistive devices for visually impaired: a state of the art survey. Pattern recognition letters, 2019, pp. 1-16 (document in press - published online 29 Oct 2018)
abstract
Recent statistics of the World Health Organization (WHO), published in October 2017, estimate that more than 253 million people worldwide suffer from visual impairment (VI) with 36 million of blinds and 217 million people with low vision. In the last decade, there was a tremendous amount of work in developing wearable assistive devices dedicated to the visually impaired people, aiming at increasing the user cognition when navigating in known/unknown, indoor/outdoor environments, and designed to improve the VI quality of life. This paper presents a survey of wearable/assistive devices and provides a critical presentation of each system, while emphasizing related strengths and limitations. The paper is designed to inform the research community and the VI people about the capabilities of existing systems, the progress in assistive technologies and provide a glimpse in the possible short/medium term axes of research that can improve existing devices. The survey is based on various features and performance parameters, established with the help of the blind community that allows systems classification using both qualitative and quantitative measu.res of evaluation. This makes it possible to rank the analyzed systems based on their potential impact on the VI people life
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
DEEP-SEE FACE: a mobile face recognition system dedicated to visually impaired people. IEEE access, september 2018, vol. 6, pp. 51975-51985
URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8466782
abstract
In this paper, we introduce the DEEP-SEE FACE framework, an assistive device designed to improve cognition, interaction, and communication of visually impaired (VI) people in social encounters. The proposed approach jointly exploits computer vision algorithms (region proposal networks, ATLAS tracking and global, and low-level image descriptors) and deep convolutional neural networks in order to detect, track, and recognize, in real-time, various persons existent in the video streams. The major contribution of the paper concerns a global, fixed-size face representation that takes into the account of various video frames while remaining independent of the length of the image sequence. To this purpose, we introduce an effective weight adaptation scheme that is able to determine the relevance assigned to each face instance, depending on the frame degree of motion/camera blur, scale variation, and compression artifacts. Another relevant contribution involves a hard negative mining stage that helps us differentiating between known and unknown face identities. The experimental results, carried out on a large-scale data set, validate the proposed methodology with an average accuracy and recognition rates superior to 92%. When tested in real life, indoor/outdoor scenarios, the DEEP-SEE FACE prototype proves to be effective and easy to use, allowing the VI people to access visual information during social events
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
DEEP-SEE: joint object detection, tracking and recognition with application to visually impaired navigational assistance. Sensors, november 2017, vol. 17, n° 11, pp. 2473-1-2473-24
URL: http://www.mdpi.com/1424-8220/17/11/2473/pdf
abstract
In this paper, we introduce the so-called DEEP-SEE framework that jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) to detect, track and recognize in real time objects encountered during navigation in the outdoor environment. A first feature concerns an object detection technique designed to localize both static and dynamic objects without any a priori knowledge about their position, type or shape. The methodological core of the proposed approach relies on a novel object tracking method based on two convolutional neural networks trained offline. The key principle consists of alternating between tracking using motion information and predicting the object location in time based on visual similarity. The validation of the tracking technique is performed on standard benchmark VOT datasets, and shows that the proposed approach returns state-of-the-art results while minimizing the computational complexity. Then, the DEEP-SEE framework is integrated into a novel assistive device, designed to improve cognition of VI people and to increase their safety when navigating in crowded urban scenes. The validation of our assistive device is performed on a video dataset with 30 elements acquired with the help of VI users. The proposed system shows high accuracy (>90%) and robustness (>90%) scores regardless on the scene dynamics
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
A computer vision-based perception system for visually impaired. Multimedia tools and applications, may 2017, vol. 76, n° 9, pp. 11771-11807
abstract
In this paper, we introduce a novel computer vision-based perception system, dedicated to the autonomous navigation of visually impaired people. A first feature concerns the real-time detection and recognition of obstacles and moving objects present in potentially cluttered urban scenes. To this purpose, a motion-based, real-time object detection and classification method is proposed. The method requires no a priori information about the obstacle type, size, position or location. In order to enhance the navigation/positioning capabilities offered by traditional GPS-based approaches, which are often unreliably in urban environments, a building/landmark recognition approach is also proposed. Finally, for the specific case of indoor applications, the system has the possibility to learn a set of user-defined objects of interest. Here, multi-object identification and tracking is applied in order to guide the user to localize such objects of interest. The feedback is presented to user by audio warnings/alerts/indications. Bone conduction headphones are employed in order to allow visually impaired to hear the systems warnings without obstructing the sounds from the environment. At the hardware level, the system is totally integrated on an android smartphone which makes it easy to wear, non-invasive and low-cost
TRUONG Arthur, ZAHARIA Titus
Laban movement analysis and hidden Markov models for dynamic 3D gesture recognition. EURASIP Journal on image and video processing, august 2017, vol. 2017, n° 1, pp. 52-1-52-16
URL: https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-017-0202-5
abstract
In this paper, we propose a new approach for body gesture recognition. The body motion features considered quantify a set of Laban Movement Analysis (LMA) concepts. These features are used to build a dictionary of reference poses, obtained with the help of a k-medians clustering technique. Then, a soft assignment method is applied to the gesture sequences to obtain a gesture representation. The assignment results are used as input in a Hidden Markov Models (HMM) scheme for dynamic, real-time gesture recognition purposes. The proposed approach achieves high recognition rates (more than 92% for certain categories of gestures), when tested and evaluated on a corpus including 11 different actions. The high recognition rates obtained on two other datasets (Microsoft Gesture dataset and UTKinect-Human Detection dataset) show the relevance of our method
BOGDAN Mocanu, TAPU Ruxandra, ZAHARIA Titus
When ultrasonic sensors and computer vision join forces for efficient obstacle detection and recognition. Sensors, october 2016, vol. 16, n° 11, pp. 1807-1-1807-23
URL: http://www.mdpi.com/1424-8220/16/11/1807/pdf
abstract
In the most recent rapport published by the World Health Organization concerning people with visual disabilities it is highlighted that by the year 2020, worldwide, the number of completely blind will reach 75 million, while the number of visually impaired (VI) humans will rise to 250 millions. For this reason, it is mandatory the development of electronic travel aid (ETA) systems able to increase the safe displacement of VI people in indoor / outdoor spaces, while providing additional cognition over the environment. In this paper, we introduce a novel wearable assistive device designed to facilitate the autonomous navigation in highly dynamic urban scenes. By joining two independent sources of information: ultrasonic sensors and the video camera embedded on a regular smartphone, the system can identify with high confidence, static or highly dynamic objects existent in the scene, regardless on their location, sizes or shape. In addition, the proposed system is able to acquire information about the environments, semantically interpret it and alert users about possible dangerous situations through acoustic feedback. To determine the performance of the proposed methodology we performed an extensive objective and subjective experimental evaluation with the help of 21 VI subjects from two blind associations. At the end of the testing phase, users pointed out that our prototype is very helpful in increasing the mobility, while being friendly and easy to learn
TRUONG Arthur, BOUJUT Hugo, ZAHARIA Titus
Laban descriptors for gesture recognition and emotional analysis. Visual computer, january 2016, vol. 32, n° 1, pp. 83-98
abstract
In this paper, we introduce a new set of 3D gesture descriptors based on the labanmovement analysis model. The proposed descriptors are used in a machine learning framework (with SVM and different random forest techniques) for both gesture recognition and emotional analysis purposes. In a first experiment, we test our expressivity model for action recognition purposes on the Microsoft Research Cambridge-12 dataset and obtain very high recognition rates (more than 97 %). In a second experiment, we test our descriptors' ability to qualify the emotional content, upon a database of pre-segmented orchestra conductors' gestures recorded in rehearsals. The results obtained show the relevance of our model which outperforms results reported in similar works on emotion recognition
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
3D object metamorphosis with pseudo metameshes. Advances in Electrical and Computer Engineering, february 2015, vol. 15, n° 1, pp. 115-122
URL: http://www.aece.ro/displaypdf.php?year=2015&number=1&article=16
abstract
In this paper we introduce a novel framework for 3D object metamorphosis, represented by closed triangular meshes. The systems returns a high quality transition sequence, smooth and gradual, that is visual pleasant and consistent to both source and target topologies. The method starts by parameterizing both the source and the target model to a common domain (the unit sphere). Then, the features selected from the two models are aligned by applying the CTPS C2a radial basis functions. We demonstrate how the selected approach can create valid warping by deforming the models embedded into the parametric domain. In the final stage, we propose and validate a novel algorithm to construct a pseudo-supermesh able to approximate both, the source and target 3D objects. By using the pseudo-supermesh we developed a morphing transition consistent with respect to both geometry and topology of the 3D models
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
ALICE: a smartphone assistant used to increase the mobility of visual impaired people. Journal of ambient intelligence and smart environments, september 2015, vol. 7, n° 5, pp. 659-678
abstract
In this paper we introduce a novel assistant for autonomous navigation of partially sighted people. The system, called ALICE, offers information about the location and possible directions a visual impaired user must follow in order to reach the desired destinations. The navigation is completed with a novel computer vision method that is able to detect and classify, in real-time, both static and dynamic obstacles without any a priori information about the obstacle type, size, position or location. The GPS localization is enriched with a visual landmark recognition technique. Finally, through audio feedback a set of warnings is launched so that the user is alerted of potential hazards. For the feedback, audio bone conduction headphones are employed in order to allow the visually impaired to hear the systems warnings, but also other sounds from the environment. At the hardware level, the system is totally integrated on a smartphone which makes it easy to wear, non-invasive and low-cost
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
Automatic assistant for better mobility and improved cognition of partially sighted persons. Advances in Electrical and Computer Engineering, august 2015, vol. 15, n° 3, pp. 45-52
URL: http://www.aece.ro/displaypdf.php?year=2015&number=3&article=6
abstract
In these paper we introduce a novel computer vision assistant for autonomous navigation of partially sighted people. We begin by detecting any type of static and dynamic obstacle present in the scene. Then, we introduce an adapted version of HOG (Histogram of Oriented Gradients) descriptor incorporated into the BoVW (Bag of Visual Words) retrieval framework and demonstrate how this combination can be used for obstacle classification. The design is completed with an acoustic feedback that alert user of potential hazards. The audio bone conduction is employed to allow the visually impaired to hear other sounds from the environment. At the hardware level, the system is totally integrated on a smartphone which makes it easy to wear, non-invasive and low-cost
GUERCHOUCHE Rachid, BERNIER Olivier, ZAHARIA Titus
Multiresolution 3D object reconstruction for collaborative interactions. Pattern recognition and image analysis, december 2008, vol. 18, n° 4, pp. 621-637
abstract
Within the framework of collaborative interactions, with 3D numerical copies of real objects inserted in virtual environments, this paper tackles the issue of 3D object reconstruction from multiple calibrated cameras. After examining the various constraints related to collaborative systems, we propose comprehensive state of the art 3D reconstruction techniques. The main families of approaches are here identified, described and discussed in detail. The analysis of the literature shows that there is a lack of methods that are able to respond to the needs of the collaborative interaction applications, and that perform adequately in terms of computation time and reconstruction accuracy. Accordingly, we propose a new multiresolution volumetric approach that is able to obtain numerical copies of real objects at multiple resolutions. Experimental results demonstrate that the proposed approach provides accurate reconstructions at reasonable, interactive computation times. Use of the proposed approach for progressive insertion of reconstructed objects in the prototype interfaces MOWGLI and Spin-3D developed by ranceTelecom R&D, is also illustrated
Communication dans une conférence à comité de lecture (46)
PANOVSKI Dancho, SCURTU Veronica, ZAHARIA Titus
A neural network-based approach for public transportation prediction with traffic density matrix. EUVIP 2018: 7th European Workshop on Visual Information Processing , Los Alamitos : IEEE Computer Society, 26-28 november 2018, Tampere, Finland, 2018, pp. 1-6, ISBN 978-1-5386-6897-9
abstract
In today's modern cities, mobility is of crucial importance, and public transportation is particularly concerned. The main objective is to propose solutions to a given, practical problem, which specifically concerns the bus arrival time at various bus stop stations, by taking to account local traffic conditions. We show that a global prediction approach, under some global macro-parameters (e.g., total number of vehicles or pedestrians) is not feasible. This observation leads us to the introduction of a finer granularity approach, where the traffic conditions are represented in terms of a traffic density matrix. Under this new paradigm, the experimental results obtained with both linear and neural networks (NN) approaches show promising prediction performances. Thus, the NN approach yields 24% more accurate prediction performances than a basic, linear regression
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
Face recognition in video streams for mobile assistive devices dedicated to visually impaired. SITIS 2018: 14th international conference on Signal Image Technology and Internet-based Systems , Los Alamitos : IEEE Computer Society, 26-29 november 2018, Las Palmas De Gran Canaria , Spain, 2018, pp. 137-142, ISBN 978-1-5386-9385-8
abstract
In this paper, we introduce a novel face detection and recognition system based on deep convolutional networks, designed to improve the visually impaired users' interaction and communication in social encounters. A first feature of the proposed architecture concerns a face detection system able to identify various persons existent in the scene regardless of the subject location or pose. Then, the faces are tracked between successive frames using a CNN (Convolutional Neural Networks)-based tracker trained offline with generic motion patterns. The system can handle face occlusion, rotation or position pose variation, as well as and important illumination changes. Finally, the faces are recognized, in real-time, directly from the video stream. The major contribution of the paper consists in a novel weight adaptation scheme able to determine the relevance of face instances and to create a global, fixed-size representation from all face instances tracked during the video stream, while remaining independent of the track length. The experimental evaluation is performed on a set of 30 video elements acquired with the help of visually impaired users. The tests experimental results obtained validate the approach with average detection and recognition scores superior to 85%
HASCOET Nicolas, ZAHARIA Titus
Local feature selection for urban image retrieval. ISSCS 2017 : International Symposium on Signals, Circuits and Systems , Los Alamitos : IEEE Computer Society, 13-14 july 2017, Iasi, Romania, 2017, pp. 1-4, ISBN 978-1-5386-0674-2
abstract
In this paper, we propose an improved image retrieval method, dedicated to images of buildings/landmarks from urban environments. Locally detected key points are binary labelled as building or no-building using a SVM-based classifier. Thereafter, only key points labelled as building are retained. In this way, the data in the database vocabulary is reduced to only the relevant one and solely the relevant features, effectively describing the targeted buildings are considered. The experimental results, carried out on the Paris6k and Oxford5k data sets show significant improvement in terms of retrieval precision
HASCOET Nicolas, ZAHARIA Titus
Building recognition with adaptive interest point selection. ICCE 2017 : International Conference on Consumer Electronics, Los Alamitos : IEEE Computer Society, 08-10 january 2017, Las Vegas , United States, 2017, pp. 1-4, ISBN 978-1-5090-5544-9
abstract
In this paper, we propose an improvement of image retrieval for building images using the Bag of Words (BoW) model. The principle consists of pre-processing the interest points detected on the images in order to classify them into two classes, corresponding to building and no-building key points. In this way, the data involved for comparisons is reduced to only the relevant one and only the features describing the buildings are taken into account. The experimental results, carried out on the Paris6k data set shows significant improvement in terms of retrieval performances
KRISTAN Matej , LEONARDIS Ales, MATAS Jiri, FELSBERG Michael, PFLUGFELDER Roman, ČEHOVIN ZAJC Luka , VOJIR Tomáš , HAGER Gustav, LUKEZIC Alan, ELDESOKEY Abdelrahman, FERNANDEZ Gustavo, GARCIA-MARTIN Alvaro, MUHIC Andrej, PETROSINO Alfredo, MEMARMOGHADAM Alireza, VEDALDI Andrea, MANZANERA Antoine, TRAN Antoine, ALATAN Aydin, MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
The Visual Object Tracking VOT2017 challenge results. ICCV 2017 : International Conference on Computer Vision Workshops, The Computer vision foundation, 22-29 october 2017, Venice, Italy, 2017, pp. 1949-1972
URL: http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w28/Kristan_The_Visual_Object_ICCV_2017_paper.pdf
abstract
The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years. The evaluation included the standard VOT and other popular methodologies and a new "real-time" experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The VOT2017 goes beyond its predecessors by (i) improving the VOT public dataset and introducing a separate VOT2017 sequestered dataset, (ii) introducing a realtime tracking experiment and (iii) releasing a redesigned toolkit that supports complex experiments. The dataset, the evaluation kit and the results are publicly available at the challenge website
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
Automatic extraction of story units from TV news. ICCE 2017 : International Conference on Consumer Electronics, Los Alamitos : IEEE Computer Society, 08-10 january 2017, Las Vegas , United States, 2017, pp. 1-2, ISBN 978-1-5090-5544-9
abstract
In this paper we propose a novel method for automatic identification of semantically consistent story units from TV news programs. The method includes a temporal segmentation procedure of the video based on visual cues, as well as a graph-driven textual analysis technique of the subtitles documents. The experimental results, obtained on a dataset of 50 videos selected from one week video archive of France Television demonstrate the pertinence of the proposed approach
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
Object tracking using deep convolutional neural networks and visual appearance models. ACIVS2017 : International Conference on Advanced Concepts for Intelligent Vision Systems , Cham : Springer, 18-21 september 2017, Antwerp, Belgium, 2017, pp. 114-125, ISBN 978-3-319-70352-7
abstract
In this paper we introduce a novel single object tracking method that extends the traditional GOTURN algorithm with a visual attention model. The proposed approach returns accurate object tracks and is able to handle sudden camera and background movement, long-term occlusions and multiple moving objects that can evolve simultaneously in a same neighborhood. The process of occlusion identification is performed using image quad-tree decomposition and patch matching, based on a convolution neural network trained offline. The object appearance model is adaptively modified in time based on both visual similarity constraints and trajectory verification tests. The experimental evaluation performed on the VOT 2016 dataset demonstrates the efficiency of our method that returns high accuracy scores regardless of the scene dynamics or object shape
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
Seeing without sight : an automatic cognition system dedicated to blind and visually impaired people. ICCV 2017 : International Conference on Computer Vision, The Computer vision foundation, 22-29 october 2017, Venise , Italy, 2017, pp. 1452-1459
URL: http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w22/Tapu_Seeing_Without_Sight_ICCV_2017_paper.pdf
abstract
In this paper we present an automatic cognition system, based on computer vision algorithms and deep convolutional neural networks, designed to assist the visually impaired (VI) users during navigation in highly dynamic urban scenes. A first feature concerns the realtime detection of various types of objects existent in the outdoor environment relevant from the perspective of a VI person. The objects are followed between successive frames using a novel tracker, which exploits an offline trained neural-network and is able to track generic objects using motion patterns and visual attention models. The system is able to handle occlusions, sudden camera/object movements, rotation or various complex changes. Finally, an object classification module is proposed that exploits the YOLO algorithm and extends it with new categories specific to assistive devices applications. The feedback to VI users is transmitted as a set of acoustic warning messages through bone conducting headphones. The experimental evaluation, performed on the VOT 2016 dataset and on a set of videos acquired with the help of VI users, demonstrates the effectiveness and efficiency of the proposed method
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
ATLAS: adaptive single object tracking using offline learned motion and visual patterns. BMVC 2017 : British Machine Vision Conference Workshops, 04-07 september 2017, Londres, United Kingdom, 2017, vol. AMMDS07, pp. 1-12
URL: https://bmvc2017.london/proceedings/
abstract
In this paper we introduce ATLAS, a novel generic single object tracker based on two convolutional neural networks (CNN) trained offline. The key principle consists in alternating between tracking using motion information and predicting the object location in time based on visual similarity. The proposed tracker uses a regression-based approach to learn offline generic relationships between object appearances and its associated motion patterns. Then, by continuously updating the target appearance model, the system adaptively modifies the object bounding box position, size and shape. Starting from the initial candidate location estimated using motion patterns, the object's position is successively shifted within the context search area based on a patch similarity function that does not require any manually designed features. The final track location corresponds to the instance that provides the maximum similarity value. The experimental evaluation, performed on the challenging datasets considered by the Visual Object Tracking (VOT) international contest in 2016 (http://www.votchallenge.net/), demonstrates the performance of our technique when compared with state-of the art methods. Our tracker runs at more than 20 fps using generic motion and visual patterns
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
Single object tracking using offline trained deep regression networks. IPTA 2017 : 7th International Conference on Image Processing Theory, Tools and Applications , Los Alamitos : IEEE Computer Society, 28 november - 01 december 2017, Montreal, Canada, 2017, pp. 1-6, ISBN 978-1-5386-1842-4
abstract
In this paper we introduce a novel single object tracker based on two convolutional neural networks (CNNs) trained offline using data from large videos repositories. The key principle consists of alternating between tracking using motion information and adjusting the predicted location based on visual similarity. First, we construct a deep regression network architecture able to learn generic relations between the object appearance models and its associated motion patterns. Then, based on visual similarity constraints, the objects bounding box position, size and shape are continuously updated in order to maximize a patch similarity function designed using CNN. Finally, a multi-resolution fusion between the outputs of the two CNNs is performed for accurate object localization. The experimental evaluation performed on challenging datasets, proposed in the visual object tracking (VOT) international contest, validates the proposed method when compared with state-of-theart systems. In terms of computational speed our tracker runs at 20fps
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
Using computer vision to see. ECCV 2016 : 14th European Conference on Computer Vision Workshops, Cham : Springer, 08-16 october 2016, Amsterdam, Netherlands, 2016, pp. 375-390, ISBN 978-3-319-48880-6
abstract
In this paper we propose a navigation assistant for visually impaired people, which uses computer vision techniques and is integrated on a wearable device. The system makes it possible to detect and recognize, in real-time, both static and dynamic objects existent in outdoor urban scenes without any a priori knowledge about the obstruction type or location. The detection system is based on relevant interest point extraction and tracking, background/camera motion estimation and foreground object identification through motion vectors clustering. The classification method receives as input image patches extracted by the detection module, performs global image representation using binary VLAD and prediction based on SVM. The feedback of our system is transmitted to visually impaired users through bone-conduction headphones as a set of audio warning messages. The entire system is fully integrated on a regular smartphone. The experimental evaluation performed on a set of 20 videos acquired with the help of VI users, demonstrates the pertinence of the proposed methodology
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
Image re-ranking using graph based spanning structures and reciprocal nearest neighbors. ICCE 2016 : International Conference on Consumer Electronics, Los Alamitos : IEEE Computer Society, 07-11 january 2016, Las Vegas , United States, 2016, pp. 437-438, ISBN 978-1-4673-8364-6
abstract
Abstract: In this paper we propose a novel method to improve the performance of image retrieval at VLAD (Vector of Locally Aggregated Descriptors) level. Our re-ranking algorithm uses relational graphs and the top-& neighborhood candidates to adaptively modify images similarity scores. The method is effective and increases the accuracy, without relying on low-level information or features geometrical verification
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
Automatic segmentation of TV news into stories using visual and temporal information. ACIVS 2016 : 17th International Conference on Advanced Concepts for Intelligent Vision Systems, Cham : Springer, 24-27 october 2016, Lecce, Italy, 2016, pp. 648-660, ISBN 978-3-319-48679-6
abstract
In this paper we propose a new method for automatic storyboard segmentation of TV news using image retrieval techniques and content manipulation. Our framework performs: shot boundary detection, global key-frame representation, image re-ranking based on neighborhood relations and temporal variance of image locations in order to construct a unimodal cluster for anchor person detection and differentiation. Finally, anchor shots are used to form video scenes. The entire technique is unsupervised being able to learn semantic models and extract natural patterns from the current video data. The experimental evaluation performed on a dataset of 50 videos, totalizing more than 30 hours, demonstrates the pertinence of the proposed method, with gains in terms of recall and precision rates with more than 5-7% when compared with state of the art techniques
PANOVSKI Dancho, ZAHARIA Titus
Simulation-based vehicular traffic lights optimization. SITIS 2016 : 12th International Conference on Signal Image Technology & Internet Based Systems, Los Alamitos : IEEE Computer Society, 28 november - 01 december 2016, Naples, Italy, 2016, pp. 258-265, ISBN 978-1-5090-5698-9
abstract
A great challenge today in urban areas and dense populated cities is determining an optimal traffic light system that can maximize the number of passing vehicles in a minimum amount of time. In this paper, we have addressed the issue of traffic flow management in urban areas by proposing as a solution a traffic light optimization method. The proposed approach uses the SUMO simulator as a cost-effective solution and a PSO optimization technique for traffic lights cycle program. A series of experimental simulations were performed with different number of vehicles in different time tables. The results obtained shows significant improvements in terms of increasing the number of vehicles that complete the simulation (4,5% to 10,5% of gain ) and average journey time necessary for the vehicles to reach their destination (5,37% to 21,53% less time loss)
TAPU Ruxandra, MOCANU Bogdan, ZAHARIA Titus
TV news retrieval based on story segmentation and concept association. SITIS 2016 : 12th International Conference on Signal Image Technology & Internet Based Systems, Los Alamitos : IEEE Computer Society, 28 november - 01 december 2016, Naples, Italy, 2016, pp. 327-334, ISBN 978-1-5090-5698-9
abstract
In this paper we propose a novel method for TV news retrieval. A first stage concerns a temporal segmentation into stories units. Then, for each story the most relevant concepts are extracted based on a multimodal fusion between visual and textual information. By analyzing the video stream, we perform global frame representation, image retrieval and re-ranking, in order to determine, with high confidence, the segments boundaries. In addition, by using the video subtitle, we identify the most relevant concepts / topics addressed in each independent segment. The framework is evaluated using one week video archive of France Television and 20 journals from NBC and CNN TV stations. For the temporal video segmentation, our system returns high precision and recall scores, superior to 90%. Regarding the topic association technique, we obtain a mean average precision score superior to 0.5
TRUONG Arthur, ZAHARIA Titus
Laban movement analysis for real-time 3D gesture recognition. MEASURING BEHAVIOR 2016 : 10th International Conference on Methods and Techniques in Behavioral Research, A.J. Spink et al. (Eds.), 25-27 may 2016, Dublin, Ireland, 2016, pp. 514-521, ISBN 978-1-873769-59-1
URL: http://www.measuringbehavior.org/files/2016/MB2016_Proceedings.pdf
abstract
In this paper, we propose a new method for body gesture recognition based upon Laban Movement Analysis (LMA). The features are computed for a dataset of pre-segmented sequences putting at stake 11 different actions, and are used to build a dictionary of key poses, obtained with the help of a k-means clustering approach. A soft assignment method based upon the obtained poses is applied to the dataset and assignment results are used as input sequences in a Hidden Markov Models (HMM) framework for real-time action recognition purpose. The high recognition rates obtained (more than 92% for certain gestures), demonstrate the pertinence of the proposed method
TRUONG Arthur, ZAHARIA Titus
Dynamic gesture recognition with Laban Movement Analysis and Hidden Markov Models. CGI 2016 : 33rd Computer Graphics International , New York : ACM, 28 june - 01 july 2016, Heraklion, Greece, 2016, pp. 21-24, ISBN 978-1-4503-4123-3
abstract
In this paper, we propose a new approach for gesture recognition based upon the quantification of Laban Movement Analysis (LMA) concepts. The resulting body features are used to build a dictionary of key poses. Then, a soft assignment method is applied to the gesture sequences to obtain a gesture representation. The assignment results are used as input in a Hidden Markov Models (HMM) scheme for dynamic gesture recognition purposes. The proposed approach achieves high recognition rates (more than 92% for certain categories of gestures), when tested and evaluated on a corpus including 11 different actions. The high recognition rates obtained on two other datasets (Microsoft Gesture dataset [1] and UTKinect-Human Detection dataset [2]) show the relevance of our method
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
An outdoor cognition system integrated on a regular smartphone device. EHB 2015 : 5th International Conference on e-Health and Bioengineering, IEEE, 19-21 november 2015, Iasi, Romania, 2015, pp. 1-4, ISBN 978-1-4673-7544-3
abstract
In this paper we introduce an assistive device dedicated to visual impaired / blind people completely integrated on a regular smartphone. The framework is designed to detect and localize static and dynamic obstacle during user navigation. We start by selecting a reduced and relevant set of FAST interest points based on a regular grid and Harris-Laplacian operator. Then, we construct a global image representation using VLAD (Vector of Locally Aggregated Descriptor) that is further whitened using PCA (Principal Component Analysis). At the end the image patch is fed to a SVM (Support Vector Machine) system that uses a statistical procedure to distinguish between different types of obstacles
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
Efficient graph spanning structures for large database image retrieval. ACPR 2015 : 3rd IAPR Asian Conference on Pattern Recognition, IEEE, 03-06 november 2015, Kuala Lumpur, Malaysia, 2015, pp. 594-598, ISBN 978-1-4799-6100-9
abstract
In this paper we propose a novel method to improve the performance of image retrieval at VLAD descriptor level. The system performs image re-ranking based on relational graphs and neighborhood relations of the top-k candidate results. The technique is able to treat differently various parts of the graph spanning structures by adaptively modifying the similarity score between images. Because most of the processing is performed offline, our algorithm does not influence the retrieval time. By dealing with uneven distribution of images in the dataset, the method is effective and increases the accuracy without relying on low-level information or on the geometrical verification of the considered features
MOCANU Bogdan, TAPU Ruxandra, ZAHARIA Titus
An obstacle categorization system for visually impaired people. SITIS 2015 : 11th International Conference on Signal Image Technology and Internet Based Systems, IEEE, 23-27 november 2015, Bangkok, Thailand, 2015, pp. 147-154, ISBN 978-1-4673-9721-6
abstract
In this paper, we introduce a new framework for obstacle localization and classification. The proposed method is designed to improve cognition of visually impaired people (VI) facilitating the autonomous navigation in outdoor environments. In the context of computer vision applications, the following contributions are proposed and validated: (i) a new method of selecting a reduced and relevant set of interest points, (ii) a novel descriptor denoted Adaptive Histogram of Oriented Gradients (A-HOG) dedicated to arbitrary categories of objects, (iii) an image re-ranking method at Vector of Locally Aggregated Descriptor (VLAD) / Bag of Visual Word (BoVW) descriptor level based on graph spanning structures and neighborhood relations. Finally, we demonstrate the performance of the proposed framework (in terms of classification accuracy and computational time) on a challenging video dataset captured with the help of real VI users. The entire framework is completely integrated on an Android smartphone device, while all methods were specifically designed and tuned under the constraint of achieving real-time processing capabilities
PANOVSKI Dancho, ZAHARIA Titus
Moteur de recherche web avec l'ontologie WOLF. TAIMA 2015 : Traitement et Analyse de l’Information Méthodes et Applications , 11-16 mai 2015, Hammamet , Tunisie, 2015
abstract
La recherche d'information sur le Web dans la langue française est un domaine qui nécessite une attention considérable, non seulement en raison de sa complexité, mais aussi, en raison d'une multitude de synonymes et d'exceptions grammaticales à gérer. Dans cet article, nous proposons une nouvelle approche pour les moteurs de recherche sémantique. Cette approche utilise une ontologie existante appelée Wordnet Libre du Français (WOLF) en tant que source de données pour créer différents types d'ontologie structurée, et Apache Lucene / Solr comme moteur de recherche. Des tests ont été effectués en utilisant différents modèles de langage de base de données (par exemple, MySQL, PostgreSQL, XML, JSON). Les résultats obtenus montrent une grande amélioration de la qualité de la recherche et une meilleure précision de l'information récupérée
SALLAMI MZIOU Mallek, ZAHARIA Titus
Buildings detection from lidar data. ISCE 2015 : 19th IEEE International Symposium on Consumer Electronics, 24-26 june 2015, Madrid, Spain, 2015, pp. 1-2, ISBN 978-1-4673-7365-4
abstract
In this paper, we present a novel method for building detection using Lidar point clouds. The proposed method uses SFTA features to identify buildings contours. The evaluations carried out on the ISPRS LIDAR reference data set, show high detection performances (89 percent)
TASLI H.Emrah, DEN UYL Tim M., BOUJUT Hugo, ZAHARIA Titus
Real-time facial character animation. FG 2015 : 11th International Conference and Workshops on Automatic Face and Gesture Recognition, IEEE, 04-08 may 2015, Ljubljana, Slovenia, 2015, pp. 1, ISBN 978-1-4799-6026-2
abstract
This demonstration paper presents a real-time facial character animation application where the facial expressions of a person are simultaneously synthesized on a virtual avatar. The proposed method does not require any training or calibration for the person interacting with the system. An Active Appearance Model based technique is used to track more than 500 points on the face to create the animated expression of the virtual avatar. The sex, age or ethnicity of the subject in front of the camera can also be automatically analyzed and hence the visualization of the avatar could be adapted accordingly. This application requires a standard web cam and is intended for gaming, entertainment or video conference purposes and will be presented in a real-time setup during the demo session
BURSUC Andrei, ZAHARIA Titus, PRETEUX Francoise
Online interactive video content retrieval. ICCE '11 : The International Conference on Consumer Electronics, IEEE, 09-12 january 2011, Las Vegas, United States, 2011, pp. 215-216, ISBN 978-1-4244-8711-0
abstract
This paper describes the on-line video browsing and retrieval platform, so-called OVIDIUS (On-line VIDeo Indexing Universal System). The proposed approach makes it possible to interactively browse video content in a hierarchical and multi-granular manner, while integrating per-segment and per-object navigation, visualization and interaction capabilities Concerning the content retrieval functionalities, the platform integrates an MPEG-7 search engine with both textual and content-based queries.
ZAHARIA Titus, LAQUET Thomas, VAUCELLE Alain, PRETEUX Francoise
The INVENIO platform for 2D/3D content re-use. ICCE 2011 : IEEE International Conference on Consumer Electronics, IEEE, 09-12 january 2011, Las Vegas, United States, 2011, pp. 83-84, ISBN 978-1-4244-8711-0
URL: http://www.alain-vaucelle.fr/wp/wp-content/uploads/2011/01/The-INVENIO-Platform-for-2D-3D-Content-Re-Use.pdf
abstract
In this paper, we propose a novel image indexing platform, so-called INVENIO (INdexing Visual ENvironment for multimedia Items and Objects). INVENIO offers to professional users both 2D and 3D content re-use facilities. Concerning the 2D aspects, the system is entirely based on the ISO/MPEG-7 normative specification. INVENIO integrates visual metadata extraction engine, annotation tools, image databases management tools, as well as appropriated, ergonomic user interfaces. In the case of 3D graphical content, INVENIO makes it possible to exploit existing animation curves for generating new content and thus accelerating the content production process
BURSUC Andrei, ZAHARIA Titus, PRETEUX Francoise
Inter and intra-video navigation and retrieval in mobile terminals. UBICOMM 2010 : 4th International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, IARIA, 25-30 october 2010, Florence, Italy, 2010, pp. 500-505, ISBN 978-1-61208-100-7
abstract
This paper introduces a novel on-line video browsing and retrieval platform, so-called OVIDIUS (On-line VIDeo Indexing Universal System). In contrast with traditional and commercial main stream video retrieval platforms, where video content is treated in a more or less monolithic manner (i.e., with global descriptions associated with the whole document), the proposed approach makes it possible to browse and access video content in a finer, per-segment basis. The hierarchical metadata structure exploits the MPEG-7 approach for structural description of video content. The MPEG-7 description schemes have been here enriched with both semantic and content-based metadata. The developed approach shows all its pertinence within a multi-terminal context and in particular for video access from mobile devices. The platform has been recently (February, 2010) validated within the framework of the Médi@TIC French national project
BURSUC Andrei, ZAHARIA Titus, PRETEUX Francoise
OVIDIUS : an on-line video indexing universal system. SPIE Optics+Photonics 2010 : Mathematics of Data/Image Coding, Compression, and Encryption with Applications XII, Bellingham : SPIE, 02-04 august 2010, San Diego, United States, 2010, vol. 7799 , pp. 77990C:01-77990C:12, ISBN 978-0-8194-8295-2
abstract
This paper introduces a novel on-line video browsing and retrieval platform, so-called OVIDIUS (On-line VIDeo Indexing Universal System). In contrast with traditional and commercial video retrieval platforms, where video content is treated in a more or less monolithic manner (i.e. with global descriptions associated with the whole document), the proposed approach makes it possible to browse and access video content in a finer, per-segment basis. The hierarchical metadata structure exploits the ISO/MPEG-7 approach for structural description of video content, which provides a multi-granular, hierarchical framework for heterogeneous metadata fusion. The issues of content interaction and visualization, which are of highest relevance in both annotation and metadata exploitation stages are also addressed. Our innovative approach makes it possible to quickly provide a comprehensive overview of complex video documents with a minimal time and interaction effort. The developed approach shows all its pertinence within a multi-terminal context and in particular for video access from mobile devices.
BURSUC Andrei, ZAHARIA Titus, PRETEUX Francoise
Mobile video browsing and retrieval with the OVIDIUS platform. MM '10 : ACM Multimedia International Conference, New-York : ACM, 25-29 october 2010, Florence, Italy, 2010, pp. 1659 -1662, ISBN 978-1-60558-933-6
abstract
This paper describes a mobile video browsing and retrievalapproach, based on the so-called OVIDIUS (On-line VIDeo Indexing Universal System) platform. In contrast with traditional and commercial video retrieval platforms, where video content is treated in a more or less monolithic manner (i.e. with global descriptions associated with the whole document), the proposed approach makes it possible to browse and access video content in a finer, per-segment basis. The hierarchical metadata structure exploits the MPEG-7 approach for structural description of video content. The MPEG-7 description schemes have been here enriched with both semantic and content-based metadata. The developed approach shows all its pertinence within a multiterminal context and in particular for video access from mobile devices. The platform has been recently (February, 2010) validated within the framework of the Médi@TIC French national project.
BURSUC Andrei, ZAHARIA Titus, PRETEUX Francoise
Mobile video navigation and retrieval services with the OVIDIUS platform. NEM Summit '10 : Towards Future Media Internet, 13-15 october 2010, Barcelone, Spain, 2010
URL: http://nem-summit.eu/wp-content/plugins/alcyonis-event-agenda//files/NEM_OVIDIUS_ID672_camera_ready1.pdf
abstract
In this paper, we present the on-line video browsing and retrieval platform, so-called OVIDIUS (On-line VIDeo Indexing Universal System), and its application to mobile video services. OVIDIUS makes it possible to browse and access video content in a fine, per-segment basis. The hierarchical metadata structure exploits the MPEG-7 approach for structural description of video content. The MPEG-7 description schemes have been here enriched with both semantic and content-based metadata. The developed approach shows all its pertinence within a multi-terminal context and in particular for video access from mobile devices. The platform has been validated within the framework of the Medi@TIC French national project.
BURSUC Andrei, ZAHARIA Titus, DELEZOIDE Bertrand, PRETEUX Francoise
OVIDIUS : an on-line video retrieval platform for multi-terminal access . CBMI '10 : 8th International Workshop on Content-Based Multimedia Indexing, IEEE, 23-25 june 2010, Grenoble, France, 2010, pp. 1-6, ISBN 978-1-4244-8028-9
abstract
This paper introduces a novel on-line video browsing and retrieval platform, so-called OVIDIUS (On-line VIDeo Indexing Universal System). In contrast with traditional and commercial video retrieval platforms, where video content is treated in a more or less monolithic manner (i.e. with global descriptions associated with the whole document), the proposed approach makes it possible to browse and access video content in a finer, per-segment basis. The hierarchical metadata structure exploits the MPEG-7 approach for structural description of video content. The MPEG-7 description schemes have been here enriched with both semantic and content-based metadata. The developed approach shows all its pertinence within a multi-terminal context and in particular for video access from mobile devices. The platform has been recently (February, 2010) validated within the framework of the Médi@TIC French national project.
PETRE Raluca Diana, ZAHARIA Titus, PRETEUX Francoise
An overview of view-based 2D/3D indexing methods. SPIE Optics+Photonics 2010 : Mathematics of Data/Image Coding, Compression, and Encryption with Applications XII, Bellingham : SPIE, 02-04 august 2010, San Diego, United States, 2010, vol. 7799, pp. 779904:01-779904:12, ISBN 978-0-8194-8295-2
abstract
This paper proposes a comprehensive overview of state of the art 2D/3D, view-based indexing methods. The principle of 2D/3D indexing methods consists of describing 3D models by means of a set of 2D shape descriptors, associated with a set of corresponding 2D views (under the assumption of a given projection model). Notably, such an approach makes it possible to identify 3D objects of interest from 2D images/videos. An experimental evaluation is also proposed, in order to examine the influence of the number of views and of the associated viewing angle selection strategies on the retrieval results. Experiments concern both 3D model retrieval and image recognition from a single view. Results obtained show promising performances, with recognition rates from a single view higher then 66%, which opens interesting perspectives in terms of semantic metadata extraction from still images/videos.
PETRE Raluca Diana, ZAHARIA Titus, PRETEUX Francoise
An experimental evaluation of view-based 2D/3D indexing methods . IEEEI '10 : IEEE 26th Convention of Electrical and Electronics Engineers in Israel, IEEE, 17-20 november 2010, Eilat, Israel, 2010, pp. 000924-000928, ISBN 978-1-4244-8681-6
abstract
This paper proposes an experimental evaluation of state of the art 2D/3D, view-based indexing methods. The principle of 2D/3D indexing methods consists of describing 3D models by means of a set of 2D shape descriptors, associated with a set of corresponding 2D views (under the assumption of a given projection model). Several experiments were conduced in order to examine the influence of the number of views and of the associated viewing angle selection strategies on the retrieval results. Experiments concern both 3D model retrieval and image recognition from a single view. Three 2D shape descriptors were tested in order to determine which of them is the most suited for such approaches. Results obtained show promising performances, with recognition rates from a single view higher than 80%, which opens interesting perspectives in terms of semantic metadata extraction from still images/videos.
TAPU Ruxandra, ZAHARIA Titus, PRETEUX Francoise
A scale-space filtering-based shot detection algorithm . IEEEI '10 : IEEE 26th Convention of Electrical and Electronics Engineers in Israel, IEEE, 17-20 november 2010, Eilat, Israel, 2010, pp. 000919 -000923, ISBN 978-1-4244-8681-6
abstract
In this paper we proposed an enhanced graph partition shot detector. In order to increase the detection efficiency, we introduce a scale space filtering-based approach, which makes it possible to derive a relative change ratio measure, instead of an absolute one. Furthermore, in order to reduce the computational complexity, a two-pass analysis approach is proposed. In a first step, the algorithm detects time intervals which can be reliably considered as belonging to the same shots. Abrupt transitions considered as certain are also detected in this stage. In a second step, the analysis is further performed only for uncertain time intervals. The proposed approach shows superior performances with respect to state of the art techniques in terms of both detection efficiency (with a gain of 10% in terms of precision and recall rates) and computational time (in average, 25% faster).
ZAHARIA Titus, VAUCELLE Alain, LAQUET Thomas
La plateforme d'indexation INVENIO : une approche MPEG-7 pour la réutilisation des contenus multimédias. CIDE13 : 13ème colloque international sur le document électronique, 16-17 décembre 2010, Paris, France, 2010
abstract
Dans cet article, nous proposons une nouvelle plateforme dindexation de contenus multimédias, dite INVENIO (INdexing Visual ENvironment for multimedia Items and Objects). Fondée entièrement sur la norme ISO/MPEG-7, la plateforme INVENIO offre, dans un système intégré, moteurs dextraction de métadonnées MPEG-7, outils dannotation, moteur de requête, outils de gestion de bases de données multimédias, ainsi que des interfaces utilisateurs appropriées et ergonomiques. Pour valider la plateforme INVENIO, nous avons considéré une application industrielle dindexation et de reconnaissances dimages couleur/texture fixes et animés, liée au projet CapDigital HD3D-IIO. A travers cette plateforme nous répondons à une problématique de réutilisation de contenus numériques au sein dune chaîne de production audiovisuelle. Les expérimentations réalisées concernent différentes chaînes de production audiovisuelle, incluant des contenus aussi bien naturels que de synthèse (i.e. dessins animés). Les solutions dindexation et de reconnaissance dimages proposées dans de cet article démontrent notamment que lexploitation des technologies MPEG-7 à travers la plateforme dindexation INVENIO permet de réaliser une économie significative de temps de travail, ainsi quune réutilisation optimale des contenus numériques en cours de production.
This paper proposes a new multimedia indexing platform, so-called INVENIO (INdexing Visual ENvironment for multimedia Items and Objects). Based on the ISO/MPEG-7 standard INVENIO integrates within a unified platform feature extraction engines, annotation tools, database management utilities and search engine, with user-friendly and ergonomic user interfaces. For validation of the INVENIO system, we have considered an industrial application related to the French CapDigital HD3D-IIO structuring project and concerning the re-use of content within the audio-visual production chain. Experiments relate to different audio-visual production chains, including both natural and synthetic content (i.e. cartoons). The proposed indexing solutions show that the exploitation of the MPEG-7 content-based indexing technologies within the INVENIO system males it possible to achieve significant gains in production time as well as an optimal reutilisation of digital content under production.
ZAHARIA Titus, VAUCELLE Alain, LAQUET Thomas, PRETEUX Francoise
INVENIO : an MPEG-7 image indexing platform for content re-use within audio-visual production chains. CBMI '10 : 8th International Workshop on Content-Based Multimedia Indexing, IEEE, 23-25 june 2010, Grenoble, France, 2010, pp. 1-6, ISBN 978-1-4244-8028-9
abstract
In this paper, we propose a novel image indexing platform, so-called INVENIO (INdexing Visual ENvironment for multimedia Items and Objects). Entirely based on the ISO/MPEG-7 normative specification, the INVENIO platform offers, within an integrated system, visual metadata extraction engine, annotation tools, image databases management tools, as well as appropriated, ergonomic user interfaces. In order to validate the INVENIO platform, we have considered an industrial application related to the issue of content re-use within an audio-visual production chain, including both natural and synthetic (i.e. cartoons) image content. The proposed solutions demonstrated that the exploitation of the MPEG-7 visual descriptors makes it possible to obtain significant savings in terms of production time/cost, while ensuring an optimal re-use of content. The INVENIO platform has been validated within the framework of the HD3D-IIO structural project of the French CapDigital competitiveness cluster.
GUERCHOUCHE Rachid, BERNIER Olivier, ZAHARIA Titus
Reconstruction volumétrique multirésolution d'objets 3D. RFIA 2008 : 16ème Congrès Francophone AFRIF-AFIA Reconnaissance des Formes et Intelligence Artificielle, 22-25 janvier 2008, Amiens, France, 2008
abstract
Cet article propose une nouvelle méthode de reconstruction d'objets 3D à partir d'un faible nombre de vues. La méthode est fondée sur une représentation hiérarchique par arbre octal d'un espace 3D de voxels. Un algorithme itératif permet, à partir d'une initialisation de résolution grossière, de retrouver des modèles 3D précis. Les contributions de ce papier incluent une nouvelle méthode pour l'estimation de la visibilité des voxels à partir des caméras, et un nouveau critère de photo-cohérence fondé sur une comparaison d'histogrammes. Les résultats expérimentaux montrent l'efficacité de la méthode proposée ainsi que l'apport important des méthodes d'estimation de la visibilité et de la photo-cohérence
This article proposes a new method for 3D reconstruction of real world objects using a low number of views. The method uses a hierarchical octree representation of a 3D voxel space. An iterative algorithm is used : starting from a coarse resolution, precise 3D models are obtained. The contributions of this paper concern a new algorithm for estimating voxels visibility from the cameras and a new criterion for voxel photoconsistency estimation based on histogram comparison. Experimental results show the efficiency of our approach as well as the influence on the reconstruction quality of the visibility and photoconsistency estimation procedures proposed.
MAMOU Khaled, ZAHARIA Titus, PRETEUX Francoise, KAMOUN Aymen, PAYAN Frédéric, ANTONINI Marc
Two optimizations of the MPEG-4 FAMC standard for enhanced compression of animated 3D meshes. ICIP 2008 : IEEE International Conference on Image Processing, IEEE Signal Processing Society, 12-15 october 2008, San Diego, United States, 2008, pp. 1764-1767, ISBN 978-1-4244-1765-0
abstract
Recently, the MPEG-4 standard adopted a novel technology for compression of dynamic 3D meshes with constant connectivity and time-varying geometry, refered to as FAMC -Frame-based Animated Mesh Compression. In this paper, we propose two optimizations of the FAMC approach, aiming at improving the compression efficiency. The first one is based on a PCA (Principal Component Analysis) decomposition of the motion compensation error residuals. The second improves the bi-orthogonal (4-2) wavelet coding approach supported by the standard, by introducing an optimal bit allocation procedure, combined with and adapted quantization of wavelet coefficients. Experimental results show that both optimizations lead to significant gains in compression rate (about 20-30%) at low bitrates
MAMOU Khaled, ZAHARIA Titus, PRETEUX Francoise
FAMC: the MPEG-4 standard for animated mesh compression. ICIP 2008 : 15th IEEE International Conference on Image Processing, October 12-15, San Diego, CA, USA , IEEE Signal Processing Society, 12-15 october 2008, San Diego, United States, 2008, pp. 2676-2679, ISBN 978-1-4244-1765-0
abstract
This paper presents a new compression technique for 3D dynamic meshes, referred to as FAMC - Frame-based Animated Mesh Compression, recently promoted within the MPEG-4 standard as Amendement 2 of part 16 AFX (Animation Framework eXtension). The heart of the method is a skinning model optimally computed from a frame-based representation and exploited for compression purposes within the framework of a motion compensation strategy. The proposed encoder offers high compression performances (gains in bitrate of 60% with respect to the previous MPEG-4 technique and of 20 to 40% with respect to state-of-the-art approahes) and is well suited for compressing both geometric and photometric attributes.
MAMOU Khaled, STEFANOSKI N., KIRSHOFFER H., MÜLLER K., ZAHARIA Titus, PRETEUX Francoise, MARPE D., OSTERMANN J.
The new MPEG-4/FAMC standard for animated 3D mesh compression. 3DTV-CON 2008 : The True Vision - Capture, Transmission and Display of 3D Video, IEEE, 28-30 may 2008, Istanbul, Turkey, 2008, pp. 97-100, ISBN 978-1-4244-1755-1
abstract
This paper presents a new compression technique for 3D dynamic meshes, referred to as FAMC - Frame-based Animated Mesh Compression, recently promoted within the MPEG-4 standard as Amendement 2 of part 16 (AFX - Animation Framework eXtension). The FAMC approach combines a model-based motion-compensation strategy with transform/predictive coding of residual errors. First, a skinning motion-compensation model is automatically derived from a frame-based representation. Subsequently, either 1) DCT/lifting wavelets or 2) layer-based predictive coding is employed to exploit remaining spatio-temporal correlations in the residual signal. Both motion model parameters and residual signal components are finally encoded by using context-based adaptive binary arithmetic coding (CABAC). The proposed FAMC encoder offers high compression performance with gains of 60% in terms of bit-rate savings relative to previous MPEG-4 technology and of 20% to 40% relative to state-of-the-art techniques. FAMC is well suited for compressing both geometric and photometric (normal vectors, colors...) attributes. In addition, FAMC also supports a rich set of functionalities including streaming, scalability (spatial, temporal and quality) and progressive transmission
MAMOU Khaled, ZAHARIA Titus, PRETEUX Francoise
FAMC : la nouvelle norme MPEG-4 pour le codage de maillage 3D animé. RFIA 2008 : 16ème Congrès Francophone AFRIF-AFIA Reconnaissance des Formes et Intelligence Artificielle, 22-25 janvier 2008, Amiens, France, 2008
abstract
Cet article présente la méthode FAMC - Frame-based Animated Mesh Compression, de compression de maillages 3D animés, récemment promue dans le standard MPEG-4 - Partie 16 AFX (Animation Framework eXtension). Le coeur méthodologique repose sur la détermination automatique d'un modèle de peau, dérivé de façon à partir d'une représentation arbitraire par trames clés. Ce modèle est ensuite exploité pour effectuer la compensation en mouvement de la géométrie 3D. Le codeur FAMC proposé offre une compression efficace (1) de la géométrie d'un maillage dynamique, avec des gains de 60% par rapport aux technique précédemment disponibles dans le standard MPEG-4 et de 30 à 40% par rapport à l'état de l'art; (2) d'attributs photométriques associés aux maillages dynamiques tels que normale, couleur... En outre, FAMC supporte les fonctionnalités avancées de streaming, de transmission progressive et de rendu scalable
MAMOU Khaled, ZAHARIA Titus, PRETEUX Francoise, STEFANOSKI N., OSTERMANN J.
Frame-based compression of animated meshes in MPEG-4. ICME 2008 : IEEE International Conference on Multimedia and Expo, IEEE, 23 june - 26 august 2008, Hannover, Germany, 2008, pp. 1121-1124, ISBN 978-1-4244-2570-9
abstract
This paper presents a new compression technique for 3D dynamic meshes, referred to as FAMC - Frame-based Animated Mesh Compression, recently promoted within the MPEG-4 standard as Amendement 2 of part 16 AFX (Animation Framework eXtension). The FAMC approach combines a model-based motion compensation strategy, with transform/predictive coding of residual errors. First, a skinning motion compensation model is automatically computed from a frame-based representation and then encoded. Subsequently, either 1) DCT/lifting wavelets or 2) layer-based predictive coding is employed to exploit remaining spatio-temporal correlations in the residual signal. The proposed encoder offers high compression performances (gains in bit rate of 60% with respect to the previous MPEG-4 technique and of 20% to 40% with respect to state-of-the-art approaches) and is well suited for compressing both geometric and photometric (normal vectors, colors...) attributes. In addition, the FAMC method supports a rich set of functionalities including streaming, scalability (spatial, temporal and quality) and progressive transmission
PRETEUX Francoise, ZAHARIA Titus
FAMC : the emerging MPEG-4 specification for dynamic 3D mesh coding. 2008 NEM Summit : Networked and Electronic Media "Towards Future Media Internet", 13-15 october 2008, Saint Malo, France, 2008
abstract
*
ZAHARIA Titus, MARRE Olivier, PRETEUX Francoise, MONJAUX Perrine
FaceTOON : a unified platform for feature-based cartoon expression generation. Three-dimensional image capture and applications 2008, Bellingham : SPIE, 28-29 january 2008, San Jose, United States, 2008, vol. 6805, pp. 68050S, ISBN 978-0-8194-6977-9
abstract
This paper presents the FaceTOON system, a semi-automatic platform dedicated to the creation of verbal and emotional facial expressions, within the applicative framework of 2D cartoon production. The proposed FaceTOON platform makes it possible to rapidly create 3D facial animations with a minimum amount of user interaction. In contrast with existing commercial 3D modeling softwares, which usually require from the users advanced 3D graphics skills and competences, the FaceTOON system is based exclusively on 2D interaction mechanisms, the 3D modeling stage being completely transparent for the user. The system takes as input a neutral 3D face model, free of any facial feature, and a set of 2D drawings, representing the desired facial features. A 2D/3D virtual mapping procedure makes it possible to obtain a ready-for-animation model which can be directly manipulated and deformed for generating expressions. The platform includes a complete set of dedicated tools for 2D/3D interactive deformation, pose management, key-frame interpolation and MPEG-4 compliant animation and rendering. The proposed FaceTOON system is currently considered for industrial evaluation and commercialization by the Quadraxis company
ROYER Julien, NGUYEN Han, MARTINOT Olivier, PREDA Marius, PRETEUX Francoise, ZAHARIA Titus
Interactive TV on parliament session . SPIE Conference on Mathematics of Data/Image Pattern Recognition, Compression, Coding, and Encryption X, with Applications, Washington : SPIE Press, 26-27 august 2007, San Diego, Ca, United States, 2007, pp. 1-8, ISBN 978-0-8194-6848-2
abstract
This paper introduces a new interactive mobile TV application related to parliament session. This application aims to provide additional information to mobile TV users by inserting automatically and in real-time interactive contents (complementary information, subject of the current session...) into original TV program, using MPEG-4 streaming video and extra real time information (news, events, databases... from RSS streams, Internet links...). Here, we propose an architecture based on plug-in multimedia analyzers to generate the contextual description of the media and on an interactive scene generator to dynamically create related interactive scenes. Description is implemented according to the MPEG-7 standard.
TRAN Son Minh, PREDA Marius, ZAHARIA Titus, PRETEUX Francoise
Advanced Video Compression for Video-On-Demand services. TFIT 2006 : 3rd Taiwanese-French Conference on Information Technology, 28-30 march 2006, Nancy, France, 2006, pp. 251-262
abstract
The work describes a research issue on designing and implementing a Video-On-Demand system. The latest technique of video compression - the MPEG-4 Advanced Video Coding - is deployed to efficiently solve the storage problem of multimedia contents. User-friendly interface for both service maintenance as well as service usage at end-user is also addressed. The proposed complete solution of Video-On-Demand service therefore not only inherits the interactive feature on watching video / movie as other system in the same branch, but also proves to be an efficient, simplicity-oriented way for storing, manipulating and retrieving multimedia contents
ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
Interactivity, reactivity and programmability: advanced MPEG-4 multimedia applications . ICCE 2006 : International Conference on Consumer Electronics, IEEE, 07-11 january 2006, Las Vegas, Nv, United States, 2006, pp. 441-442, ISBN 0-7803-9459-3
abstract
This paper presents recent developments on the creation of advanced iDTV applications with MPEG-4 technologies. Elaborated within the framework of the European ITEA Jules Verne project the proposed Encyclopedia application exploits sophisticated multimedia representations enriched with dynamic behaviors, enabled by the programmatic functionality supported by the MPEG-4 standard. A comprehensive integration of the 3D graphics and audio-visual MPEG technologies together with the Java programming is achieved
Autres rapports (16)
CAO Chao, PREDA Marius, ZAHARIA Titus
[PCC] TMC2 CE2.4 crosscheck result. july 2018
abstract
This is a crosscheck report of m42680 on "PCC TMC2 CE2.4 Lossless Compression" submitted by Samsung. We crosschecked subtest A on top of absoluteD1 coding on Condition 2.1, 2.2, 2.5 (random access, lossless geometry), 2.6 (random access lossless geometry+color). Our simulation results exactly matched with the ones provided by the proponents
CAO Chao, PREDA Marius, ZAHARIA Titus
Proposal for adaptive orientation of projection planes. july 2018
abstract
The current TMC2 projection method fixes the projection planes to the following six oriented planes which are defined by their normals: (1.0, 0.0, 0.0); (0.0, 1.0, 0.0); (0.0, 0.0, 1.0); (-1.0, 0.0, 0.0); (0.0, -1.0, 0.0) and (0.0, 0.0, -1.0). [1] In the original TMC2 Patch generation process, the point cloud is decomposed into a minimum number of patches with smooth boundaries, while minimizing the reconstruction error. The clustering of the point cloud is first initialized by computing the maximal dot product of the plane normal and the point normal which is estimated using PCA (Principal Component Analysis) as described in [2]. Then followed with refining iteration by updating the cluster index to smooth the orientation of its nearest normals. This method is low complexity and efficient to use as the normals are all integers. However, the parts of the decoded point cloud with orientation other than vertical or horizontal may contains holes, with significant visual impact, because they lead to a D1 layer that has a form which is not compression friendly. This proposal is experimenting an adaptive orientation method to reduce the size of the projected 2D patch and the number of missed points for the entire point cloud while minimizing the distortion
CAO Chao, PREDA Marius, ZAHARIA Titus
[PCC] TMC2 CE2.14 crosscheck result. july 2018
abstract
This is a crosscheck report of m42761 on "[PCC] TMC2 A differential coding method for patch side information" submitted by Huawei Technologies Inc. We crosschecked condition-C2.4 lossyG,lossyAttr,interRA. Our simulation results exactly matched with the ones provided by the proponents
CAO Chao, PREDA Marius, ZAHARIA Titus
Updates on adaptive orientation of projection planes method. july 2018
abstract
The proposed method: adaptive orientation of projection planes is experimenting an adaptive orientation method to find a better oriented plane for projection. We try to reduce the size of geometry bitstream by replacing the depth as the distance between the points to new-projection-plane. Meanwhile, we need to minimize the number of missed points and distortion for the entire point cloud. In this document, we describe test results on how our proposal affects the depth value, packing of the 2D image and the amount of missed points
CAO Chao, PREDA Marius, ZAHARIA Titus
[PCC] TMC2 CE2.8 crosscheck result on absoluteD1 coding method . july 2018
abstract
This is a crosscheck report of m42680 on "PCC TMC2 CE2.4 Lossless Compression" submitted by Samsung. We crosschecked subtest A on top of absoluteD1 coding on C2.3 lossyG,lossyAttr and C2.4 lossyG,lossyAttr,interRA. Our simulation results exactly matched with the ones provided by the proponents
CAO Chao, PREDA Marius, ZAHARIA Titus
Validation process and results of the submitted data for the responses to the Point Cloud Compression CfP. july 2018
abstract
The report presents the procedures, results and some detected issues we have while verifying the submitted data and data sheets. Since all proponents respond to the compression at Cat2-Lossy Geo. & Color-AI&RA[1], we took the sampling mainly in this two catalogs. The main efforts have been made on verifying whether the data submitted in the data sheets match or not, whether the applications can be used properly, whether the visualization of the decoded files is good, etc
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
SC3DMC Reference software update. october 2009, ISBN m17061
abstract
Report of 90th MPEG Meeting, ISO/IEC JTC 1/SC 29/WG 11, coding of moving pictures and audio - Xi'an (CN)
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
TFAN bitstream syntax update. april 2009, ISBN m16435
abstract
Report of 88th MPEG Meeting, ISO/IEC JTC 1/SC 29/WG 11, coding of moving pictures and audio - Maui(HI)
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
Encoder implementation for FAMC. july 2008, ISBN m15683
abstract
MPEG 85 : 85th Meeting Moving Picture Experts Group, July 21-25, Hannover, Germany. Report ISO/IEC JTC 1/SC 29/WG 11 N9963
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
Low-complexity approach for static mesh compression . january 2008
abstract
Report from the 83rd meeting of the International Organisation for Standardisation, ISO/IEC JTC 1/SC 29/WG 11 N9558, MPEG2008/M15153, Antalya, TR
MAMOU Khaled, ZAHARIA Titus, PRETEUX Francoise
TFAN : a low complexity approach for static 3D mesh compression . april 2008, ISBN m15438
abstract
MPEG 84 : 84th Meeting Moving Picture Experts Group, April 26 - May 2, Archamps, France. Report ISO/IEC JTC 1/SC 29/WG 11 N9740
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
TFAN software description. july 2008, ISBN m15653
abstract
MPEG 85 : 85th Meeting Moving Picture Experts Group, July 21-25, Hannover, Germany. Report ISO/IEC JTC 1/SC 29/WG 11 N9963
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
FAMC integration into the MPEG-4 RefSoft . january 2008, ISBN M15150
abstract
The contribution presents the implementation of FAMC in IM1 indicating the supported functionalities. A demonstration of the reference software was shown. 83rd meeting of the International Organisation for Standardisation, ISO/IEC JTC 1/SC 29/WG 11 N9558, MPEG2008/M15150, Antalya, TR
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
TFAN stream description. october 2008, ISBN m15825
[PDF]
abstract
MPEG 86 : 86th Meeting Moving Picture Experts Group, October 13-17, Busan, Korea. Report ISO/IEC JTC 1/SC 29/WG 11 N10115
MAMOU Khaled, ZAHARIA Titus, PRETEUX Francoise
On the status of the FAMC encoder source code. april 2008, ISBN m15440
abstract
MPEG 84 : 84th Meeting Moving Picture Experts Group, April 26 - May 2, Archamps, France. Report ISO/IEC JTC 1/SC 29/WG 11 N9740
MAMOU Khaled, ZAHARIA Titus, PRETEUX Francoise
FAMC decoder conformance. january 2008, ISBN M15149
abstract
83rd MPEG Meeting : Moving Picture Experts Group, January 14-18 , 2008, Antalya, Turkey. Standardization Report ISO/IEC JTC1/SC29/WG11, MPEG2008/M15149
Communication dans une conférence sans acte ou actes à diffusion limitée (1)
MAMOU Khaled, ZAHARIA Titus, PREDA Marius, PRETEUX Francoise
SC3DMC integration into IM1. 91st MPEG Meeting, 18-22 january 2010, Kyoto, Japan, 2010
abstract
A new version of IM1 including SC3DMC is updated and is available on the SVN. The full set of conformance bitstreams is generated.